~MS324 Waves, diffuston 
and variational 
principles 


The Open University 


Block II Random walks and diffusion | 
———_ 
= — —— 


MS324 Waves, diffusion and 
variational principles 


The Open University 


Block I! 


Random walks and diffusion 


This publication forms part of an Open University course. Details of this and 
other Open University courses can be obtained from the Student Registration 
and Enquiry Service, The Open University, PO Box 197, Milton Keynes 
MK? 6BJ, United Kingdom: tel. +44 (0)845 300 6090, email 
general-enquiries@open.ac.uk 


Alternatively, you may visit the Open University website at 
http: //www.open.ac.uk where you can learn more about the wide range of 
courses and packs offered at all levels by The Open University. 


To purchase a selection of Open University course materials visit 
http://www.ouw.co.uk, or contact Open University Worldwide, Walton Hall, 
Milton Keynes MK7 6AA, United Kingdom, for a brochure: tel. +44 (0)1908 
858793, fax +44 (0)1908 858787, email ouw-customer-services@open.ac.uk 


The Open University, Walton Hall, Milton Keynes, MK7 6AA. 
First published 2005. Second edition 2010. 
Copyright © 2005, 2010 The Open University 


All rights reserved. No part of this publication may be reproduced, stored in a 
retrieval system, transmitted or utilised in any form or by any means, electronic, 
mechanical, photocopying, recording or otherwise, without written permission from 
the publisher or a licence from the Copyright Licensing Agency Ltd. Details of such 
licences (for reprographic reproduction) may be obtained from the Copyright 
Licensing Agency Ltd, Saffron House, 6-10 Kirby Street, London ECI1N 8TS; 
website http: //www.cla.co.uk. 


Open University course materials may also be made available in electronic formats 
for use by students of the University. All rights, including copyright and related 
rights and database rights, in electronic course materials and their contents are 
owned by or licensed to The Open University, or otherwise used by The Open 
University as permitted by applicable law. 


In using electronic course materials and their contents you agree that your use will 
be solely for the purposes of following an Open University course of study or 
otherwise as licensed by The Open University or its assigns. 


Except as permitted above you undertake not to copy, store in any medium 
(including electronic storage or use in a website), distribute, transmit or retransmit, 
broadcast, modify or show in public such electronic materials in whole or in part 
without the prior written consent of The Open University or in accordance with the 
Copyright, Designs and Patents Act 1988. 


Edited, designed and typeset by The Open University, using the Open University 
TREX System. 
Printed in the United Kingdom by Cambrian Printer Limited, Aberystwyth. 


ISBN 978 0 7492 5168 0 


Z.1 

© 
The paper used in this publication contains pulp sourced from forests independently certified J \ 
to the Forest Stewardship Council (FSC) principles and criteria. Chain of custody certification FSC 
allows the pulp from these forests to be tracked to the end use. (see www.fsc-uk.org). coetegtenmiammend 


Cert no, TT-COC-2200 


isc.org 
& 1996 Forest Stewardship Council 


Contents 


INTRODUCTION AND OVERVIEW 


CHAPTER 1 PROBABILITY AND STATISTICS 


i 


1.2 


is 


1.4 
1.5 


Probability 

1.1.1 Probability for discrete events 

1.1.2 Probabilities for combinations of events 
1.1.3. Probabilities for successive trials 

1.1.4 Continuous random variables 

1.1.5 Two or more random variables 


Statistics 
1.2.1 Statistics of a single random variable 
1.2.2 Statistics involving two or more random variables 


The normal distribution 
1.3.1 The error funmetion 


Outcomes 


Further Exercises 


Solutions to Exercises in Chapter 1 


CHAPTER 2 DISCRETE RANDOM FUNCTIONS 


2 
Le 


22 


2.4 
420 
2.6 
Za 
yi. 
2.9 


AND RANDOM WALKS 
Introduction 


An elementary random function 
2.2.1 Defining the coin-tossing function 
2.2.2 Statistics of the coin-tossing function 


Random walks 
2.3.1 Definition and mathematical description 
2.3.2 Some examples of random walks 


Statistics of random walks 

Probability distribution of a random walk 

An approximate form for the probability distribution 
Relationship with the diffusion equation 

A random function with correlations 


Summary and discussion 


2.10 Outcomes 
2.11 Further Exercises 


Solutions to Exercises in Chapter 2 


CHAPTER 3 THE DIFFUSION EQUATION 
3.1 Introduction 
3.2. Diffusion 
3.3 Concentration and flux density 
3.3.1 Concentration 
3.3.2 Molar concentration 
3.3.3 Flux and flux density 
3.3.4 Relating flux to the vector flux density 
3.3.9 Defining flux density in one dimension 
3.3.6 Summary of concentration and flux density 
3.4 The continuity equation 
3.4.1 ‘The continuity equation in one dimension 
3.4.2. The continuity equation in three dimensions 
3.4.3 Relation with Gauss’s theorem 
3.9 The diffusion equation 
3.6 The heat equation 
3.7 <A solution of the diffusion equation 
3.8 Summary and outcomes 
3.9 Further Exercises 
3.10 Appendix: observing diffusion in a simple experiment 


Solutions to Exercises in Chapter 3 


CHAPTER 4 SOLUTIONS OF THE DIFFUSION 
EQUATION 


4.1 Introduction 
4.2 Diffusion in an infinite medium 


4.3 Solution in a finite medium in one dimension 
4.3.1 Separation of variables 
4.3.2 Identifying eigenfunctions 
4.3.3 General solution 
4.3.4 Orthogonality relation 
4.3.5 Calculation of Fourier coefficients 


4.4 Orthogonality of eigenfunctions 
4.4.1 Ejigenfunctions and eigenvalues 
4.4.2  Orthogonality of functions 
4.4.3 Orthogonality relation 
4.5 Diffusion in a finite medium in two and three dimensions 
4.5.1 ‘The Helmholtz equation 
4.5.2 A solution of the Helmholtz equation in two 
dimensions 
4.5.3 Orthogonality and generalised Fourier series 


4.6 Semi-infinite domains: temperature waves 
4.7 Outcomes 
4.8 Further exercises 


Solutions to Exercises in Chapter 4 


75 
19 
76 


78 
78 
SO 
Sl 
83 
89 
86 


87 
S7 
89 
ar 


93 
95 
oT 
oo 
100 
102 
103 


115 
115 
116 


wa 
pe 
123 
123 
124 
124 


126 
126 
Lad 
128 


129 
129 


131 
134 


137 
142 
142 
144 


CHAPTER 5 THE CENTRAL LIMIT THEOREM 


O.1 
D2 
2.0 
5.4 
9.0 
0.6 
Dek 
9.8 


Introduction 

The central limit theorem 

Distribution of sums of random variables 

The Gaussian approximation to [f(x)|’ (Optional) 
Fourier transform of a probability density (Optional) 
Summary 

Outcomes 


Further exercises 


Solutions to Exercises in Chapter 5 


CHAPTER 6 MICROSCOPIC DERIVATION OF 


os 
6.2 


6.3 


6.4 


6.5 
6.6 


THE DIFFUSION EQUATION 
Introduction 


Continuous random walks 

6.2.1 The continuous random walk in one dimension 
6.2.2 Random walks in two and three dimensions 
From random walks to the diffusion equation 

The Fokker—Planck equation 

6.4.1 Generalised diffusion processes 

6.4.2 Probability density for generalised diffusion 
6.4.3. Derivation of the Fokker—Planck equation 

6.4.4 The Fokker—Planck equation in three dimensions 
Summary and outcomes 


Further Exercises 


Solutions to Exercises in Chapter 6 


INDEX 


153 
153 
154 
158 
163 
166 
168 
169 
170 
171 


177 
ei 


Tf 
ig 
1381 


183 


185 
185 
186 
187 
190 


191 
191 
193 


it 


= 
i 


- 


310 


Mi 


Introduction and overview 


This block is designed as an introduction to two related topics. 


e Random functions and random walks 


The block introduces techniques used to model many of the types of ran- 
dom processes which occur in the natural world. It reviews the concepts 
of probability, random numbers and statistics, and their application to 
unpredictable phenomena. Sometimes it is desirable to understand how 
random quantities vary as a function of time (or some other variable), 
and for this reason it is useful to be able to define random functions, 
as well as random variables. We shall see how various types of random 
functions can be defined. One of these, the random walk, is a particularly 
important model, and is discussed in detail. 


e Diffusion and the heat equation 


Diffusion is a process in which materials are mixed by random micro- 
scopic motion of atoms, without any large scale (macroscopic) motion 
(such as occurs when a liquid is stirred). Diffusion can be observed by 
dropping a pellet of dye into a glass bowl containing water which is left 
standing upon a cool floor. The dye dissolves, and over a period of sev- 
eral hours it gradually spreads away from the point where the dye pellet 
was standing. This block introduces the diffusion equation, which gives 
a deterministic (non-random) description of how the dye spreads. It also 
presents a derivation of the diffusion equation from the random micro- 
scopic motions of individual atoms, using the statistics of the random 
walk process. 


Processes described by the diffusion equation and closely related equa- 
tions occur in many different areas of science. One application of the 
diffusion equation is to describe the flow of heat: it determines how the 
temperature of a body varies as a function of position and time. For 
this reason the diffusion equation is often called the heat equation, and 
many of the exercises are problems relating to the flow of heat. 


Examples of random processes 


This block is largely concerned with modelling random processes (also called 
stochastic processes). The techniques and concepts of probability and statis- 
tics are the tools required for understanding unpredictable events, so an el- 
ementary knowledge of these is required. The necessary concepts from the 
theory of probability and statistics are discussed in Chapter 1. 


Before discussing this background material, it may be useful to describe 
some of the random processes to which these techniques might be applied. 
Figure 0.1 shows the number N of people standing in a queue, displayed 
as a function of time t. It is a mapping from the real line to the set of 
positive integers, which jumps by +1 at random times. The number of 
people standing in the queue at any one time is unpredictable. In order 
to describe this situation mathematically, it is necessary to develop models 
for random or stochastic functions, where the value of a function N(t) is 
assigned by some random process. 


M(t) 


— = Ww dm A 


Figure 0.1 Number of people standing in a queue 


There are many different types of stochastic processes, and some further 
examples are shown in the figures below. Figure 0.2 is a plot of the output 
voltage V of a Geiger counter, plotted as a function of time. The device 
produces a pulse whenever it detects a particle produced in a radioactive- 
decay process. ‘The pulses all have the same shape, but the times at which 
they are produced are completely random. 


V(t) 


“~~ 


Figure 0.2. Output from a Geiger counter 


Figure 0.3 shows the temperature recorded at midnight at a weather station 
in Cape Town on a succession of days starting at 4 January 2004. There 
are apparently unpredictable variations from the average. Even so, the 
successive numbers appear to be correlated, in that a day with a lower-than- 
average temperature is usually followed by another cooler-than-average day. 


temperature at 22 


00.00GMT 
(centigrade) Average 
20 
18 
16 
5 10 13 20 pe 30 


days from Ist January 


Figure 0.3. ‘Temperature recorded at midnight in Cape Town on a series of 
January nights; the average value is shown for comparison. The values on 
successive days are correlated. 


Figure 0.4 is a photograph of the surface of an ocean on a windy day. The 
height h of the surface (at a particular instant in time) is modelled by a 
continuous function of position (x,y) in the horizontal plane. An example 
of such a function is shown in Figure 0.5. Since h(z,y) is a continuous 


Introduction and overview 


Introduction and overview 


function, knowing the height at one position gives an indication of what 
it might be at a nearby position. This is therefore another example of a 
random function with correlations. 


Figure 0.4 Photograph of the mid-Atlantic Ocean (at sunset), showing a typical 
surface pattern 


h(x.y) 


Figure 0.5 The height h(x, y) of the ocean surface on a windy day is a random 
function of position 


Figure 0.6 is a plot of one Cartesian component x(t) of the position of a tiny 
dust particle moving in still water. The particle has a random motion (called 
Brownian motion) due to the fact that it is being jostled by movement of 
the water molecules. Like Figure 0.5, this is a mapping from one continuous 
variable to another, but it is very different in character. First, this is not a 
smoothly varying function. (The function is continuous, but its derivative 
is not well defined at any point, that is, it is nowhere differentiable.) Also, 
the magnitude of x(t) is tending to increase, whereas the typical value of the 
function plotted in the previous examples does not change as time progresses. 
This example is modelled by a type of random function called a random walk, 
which is discussed in some detail in this block. 


0 it 2 e) - +6 


Figure 0.6 Brownian motion: one Cartesian component of the displacement of a 
tiny dust particle in still water as a function of time 


Our final example, shown in Figure 0.7, is a plot of the value of a market 
price as a function of time, namely the closing price of gold (in US dol- 
lars per ounce) on a sequence of 996 trading days, from 1 January 1998 to 
27 December 2001. Its character is clearly similar to Figure 0.6, in that the 
fluctuations are also very erratic, and there is no reason for the function 
to remain close to any particular value. The time-dependence of prices of 
commodities, shares and other traded assets is often modelled by a type of 
random walk. 


price of gold 
(USD/ounce) 
340 


320 
300 


280 


0 200 400 600 800 
trading days from Ist January 1998 


Figure 0.7. The price of gold (in US dollars per ounce) over the four years from 
January 1998 to December 2001 


Many other examples of random processes can be observed in everyday life. 
Of course, it is (by definition). impossible to predict the precise course of 
these random processes. But it is an important task for applied mathemati- 
cians to be able to describe them as fully as possible. ‘This requires the use 
of statistical techniques, in which these processes are modelled by random 
functions, which are introduced in Chapter 2. 


Introduction and overview 


Introduction and overview 


The relation between random and deterministic 
processes 


There is one type of situation in which random motions do lead to pre- 
dictable results. ‘This happens when we observe the motion of a large num- 
ber of randomly moving particles. When the numbers of particles are very 
large, the number of particles in a given region of space, NV say, can be pre- 
dicted very accurately (in the sense that the ratio of the error dN to N is 
small, although dN itself may be very large). In these situations, accurate 
predictions are possible even though the motions of individual particles are 
random. The block discusses the derivation and solution of the diffusion 
equation, a partial differential equation which describes the motion of large 
numbers of randomly moving particles. 


Chapter 3 introduces the diffusion equation, describing diffusion as a macro- 
scopic phenomenon, giving a derivation based upon a plausible assumption, 
which is sometimes known as Fick’s law. Chapter 4 explores the solutions of 
the diffusion equation; here many of the techniques (separation of variables, 
Fourier analysis) have already been encountered in finding solutions of the 
wave equation. 


Later chapters consider the relationship between the deterministic diffusion 
equation and the random microscopic motions which give rise to diffusion. 
Chapter 5 discusses the central limit theorem, a very general and powerful 
concept in probability theory, which can be used to explain the connection 
between random walks and diffusion in a restricted case. Chapter 6 discusses 
the derivation of the diffusion equation from a microscopic theory, initially 
using the central limit theorem, and then by giving a derivation of a rather 
general form of the diffusion equation, called the Fokker—Planck equation. 


It may be useful to include a few words about closely related areas which will 
not be treated here. 


A vast range of phenomena can be described by random processes, and we make 
no attempt to give anything like a complete or systematic treatment. Numerous 
other examples can be found by consulting books on stochastic processes. Our 
discussion of random functions will concentrate on the random walk, because 
of its close connection with the diffusion equation. 


There is also an important area of theoretical physics known as statistical me- 
chanics, which combines statistical and mechanical principles to define quanti- 
ties such as temperature, and which leads to methods for calculating properties 
of materials in thermal equilibrium. This too is outside the scope of this course. 
Although the discussion of the microscopic mechanism of diffusion involves a 
combination of statistical and dynamical ideas, it is distinct from statistical 
mechanics in that this course does not require the mechanical definition of 
temperature. 


oo - = 


ee soe 


CHAPTER | 


Probability and statistics 


This chapter discusses the concepts from probability and statistics which 
you will need for the remainder of this block. Many of the ideas may be 
familiar from earlier courses, but the notation, terminology and definitions 
may be different. 


|.1 Probability 


l.1.1 Probability for discrete events 


If you perform some procedure where the set of possible outcomes is known, 
but the actual outcome is uncertain, it is useful to describe each possible 
outcome in terms of its probability. The simplest example is tossing a coin, 
where there are two possible outcomes, namely the coin landing heads up 
or tails up. This is a commonly used method to make an unbiased random 
decision between two possibilities. Most people would be comfortable with 
the statement that both outcomes are equally likely. The equivalent mathe- 
matical statement is that both outcomes have probability equal to one half. 
We start by reviewing how probabilities are defined. 


Suppose that we undertake a particular ‘trial’ or experiment in which there 
are n possible distinct outcomes: 


outcome 1, outcome 2, ..., outcome n. (.2) 


We perform the experiment N times, and count how many times each out- 
come occurs, say 


N,, Ne; ee. eo 


where N; is the number of times outcome 2 is observed; N; is sometimes 
referred to as the frequency of outcome 7. We now examine the values 
of N;/N as N increases. Provided that the circumstances of the trial do 
not change, we expect that the values of N;/N will approach a limit as N 
increases. The probability P; of the occurrence of outcome 7 is defined as 


N; 
P,= lm —. (3) 
N—-oo 


The sum of the number of times N; that each distinct outcome occurs is 
equal to N, ie. S>., N; = N. The sum of the probabilities of all possible 
outcomes is therefore 


ae ee 
eS ar Sere a 


Throughout this block we 
shall use the terms 
‘outcome’ and ‘event’ 
interchangeably. 


14 Chapter | Probability and statistics 


SO 


SP, = 1. (1.5) 


Sometimes probabilities are known on theoretical grounds. For example, 
when a coin.is tossed,there are n = 2 outcomes, and we expect that the coin 
is as likely to fall ‘tails’ as it is to fall ‘heads’ (assuming that the coin is 
fair). One therefore expects that the probability of falling heads, Py, and 
the probability of falling tails, P,, are equal (and equal to one half, because 
equation (1.5) gives P, + P, = 1). Similarly, a six-sided die is expected to 
have an equal probability of falling with any face upwards, i.e. Pj = 7 for 
1=1,...,6. Any trial where nothing favours any one of n possible outcomes 
has probability P; = 1/n for any outcome. 


Exercise |.] 


Try tossing a coin N times, and record the number of heads Ny. As you increase N, 
you will find that N,/N approaches one half, slowly and erratically. Do not let N 
be too large; about 40 should suffice for this experiment. 


In other circumstances, probabilities are not known exactly, and must be 
determined empirically, by repeated trials. An example would be a surgical 
procedure where the outcome is either success or failure. If an operation is 
performed 943 times, and the outcome is successful 455 times, the proba- 
bility of success may be presumed to be close to 455/943 ~ 0.48. There are 
interesting problems associated with deciding how accurate this estimate is. 
For example, could you reliably conclude that a variation of the operating 
procedure, which had been successful on 7 out of 10 occasions, should be 
adopted in preference to the original? Such questions are treated in many 
texts on statistics, but they will be ignored in this course. Here we shall 
be concerned with situations in which the probabilities for the whole set 
of possible outcomes can be deduced from theoretical arguments. However, 
the following example illustrates an interesting use of empirical estimates of 
probabilities. 


Example I.| 


Let P. be the probability that a letter chosen at random on a page of En- 
glish text is the letter e. Estimate P, by counting letters in the first sentence 
of the preceding paragraph (that is, the sentence starting ‘In other circum- 
stances ...’), then in the whole of that paragraph. 


Solution 


In the first sentence there are 97 letters, of which 13 are the letter e, giv- 
ing the estimate P, ~ 13/97 = 0.134. In the whole paragraph there are 705 
letters, of which 98 are the letter e, giving P. ~ 98/705 = 0.139. The fre- 
quency of the letter e in longer texts is close to this value; for example, 
in the book of Psalms, in the King James Bible, it has been quoted as 
P, ~ 0.13. Incidentally, the use of estimates of letter frequencies has often 
been used to decipher simple substitutional codes; an amusing example of 
this can be found in one of Arthur Conan Doyle’s Sherlock Holmes stories, 
‘The adventure of the dancing men’. 


|.1 Probability 


|.1.2 Probabilities for combinations of events 


Sometimes we are interested in outcomes that are combinations of different 
elementary outcomes for which the probabilities are already known. (Ele- 
mentary outcomes are the simplest possible outcomes in terms of which all 
other more complicated outcomes can be expressed.) For example, what 
is the probability of throwing a number less than 3 (either 1 or 2) with a 
six-sided die? The number of times this happens is Ny o¢2 = N; + No, where 
N; and No are the numbers of trials where 1 and 2 are thrown, respectively. 
It follows from equation (1.3) that the probability of throwing 1 or 2 is 
Prorg = P, + Po. (We know that P, = i and P) = * are the probabilities 
of throwing 1 and 2, respectively, so Pi or2 = 5-) In general, you can add 
the probabilities of outcomes which are mutually exclusive events to give the 
probability of the combined event: the probability of observing outcome A 
or outcome B is 


FiorB = Pa + Pp. (1.6) 


As an example of two events which are not mutually exclusive, a person can 
have both blonde hair and blue eyes. In cases like this, the generalisation of 
equation (1.6) is 


Paorp = Pa + Pe > Bagoaw: (ic7) 


where Pa anap is the probability of outcomes A and B occurring in the same 
trial. (Subtracting the term Pa, angp avoids double-counting the cases where 
events A and B both happen.) This more general version of equation (1.6) will 
not be required in the remainder of the course. 


Exercise |.2 


What is the probability of drawing a picture card (King, Queen or Jack) from a 
shuffled deck of 52 playing cards? 


Exercise 1.3 


The table below lists the number of deaths recorded in England in a given year, for 
people in given age ranges. Use this table to estimate the probability that someone 
will die between the ages of 25 and 64. What assumption have you used? 


Table I.1 


Number of deaths 


6 O00 
1 200 
4 400 


14 600 
88 200 
141 300 
319 500 


16 Chapter |_ Probability and statistics 


1.1.3 Probabilities for successive trials 


In some cases we might need probabilities for events at successive trials. 
If the trials are independent, meaning that the outcome of one trial does 
not influence another, the probabilities are multiplied. For example, the 
probability of drawing an ace from a deck of cards is 4/52 = 1/13, so the 
probability of drawing an ace on two successive independent trials, with the 
card that was drawn replaced and the deck shuffled, is ns * a = id Masi ih 
general, if a trial is repeated, then the probability of obtaining result ‘a’ on 
the first trial and ‘b’ on the second trial is 


Pap = Pa Po, (1.8) 


with P,, denoting the probability of the events occurring in succession. ‘T’his 
applies only if the trials are independent. In the above example, if the first 
card chosen is not returned to the deck, then the probability of drawing a 
second ace from the deck, having found one on the first draw, is 3/51, and 
the probability of two aces in succession is = x a ee 1 


When successive events are not independent, the probability of event ‘b’ fol- 
lowing event ‘a’ is written 


Fab = Fats iven a) (1-9) 
9 £ 


where Py givena is the probability that event *b’ happens on the second trial if 

event ‘a’ has occurred on the first trial. In the preceding example, events ‘a’ 

and ‘b’ are both drawing an ace from the deck, but the successive trial is such 

that the second card is drawn while not returning the first drawn card to the 
3 


deck, so we have P, = +; and Pb givena = #7 - 


In the remainder of this course we shall always be dealing with situations where 
events are independent, so that Pp givena = Pp, and equation (1.8) can be used. 
This would be the case when events ‘a’ and ‘b’ are drawing aces from the deck 
but drawing the second card only after returning the first drawn card to the 
deck and reshuffling. 


Exercise |.4 


When throwing two six-sided dice at the same time, what is the probability that 
no number thrown on either die is higher than two? 


First derive the result from equation (1.8), then check it by counting all of the 
possible pairs of throws with neither number higher than two. 


Exercise 1.5 


If a sport can be practised only on approximately four days out of ten due to 
weather limitations, and an enthusiast has time available only at weekends, what 
is the probability that she can practise the sport on any given day? Approximately 
how many suitable days would be available each year? 


Exercise 1.6 


Show that when two six-sided dice are thrown, the probability that the sum of the 


scores is equal to seven is rt 


i 


|.1 Probability 


1.1.4 Continuous random variables 


We can generalise those concepts developed for discrete events to the case 
of trials having a continuous range of possible outcomes, such as when a 
measurement of a quantity yields a real number x. An example of such a 
trial would be the measurement of the height in cm of a person chosen at 
random from a large population. (We are considering an idealised situation 
where a person’s height can be measured with such great precision that it 
may be regarded as a continuous variable.) When the set of possible values 
of x is continuous, it is not useful to ask for the probability that x takes a 
given value, such as x9, because (except in exceptional circumstances) the 
number will never be precisely equal to 7. We can ask, instead: ‘What 
is the probability that x lies in an interval between xo and 2p + ox?’ For 
example, it is meaningful to ask the question ‘What is the probability that 
an adult will have a height between 190 cm and 191 cm?’, but it is not useful 
to ask ‘What is the probability that an adult will have a height of exactly 
190 cm?’, since this probability would be vanishingly small if the height is 
measured to arbitrary accuracy. 


The probability that the height of an individual selected from a very large 
specified population is between 2; and 22 will be a function of x, and 2, 
and will be written P(x1,22). (We shall assume throughout that x2 > 21.) 
Now, if 6x = x2 — 21 is small, it is natural to expect that this probability 
will be proportional to 62. (In the example given above, you might reason- 
ably expect that the probability for a person’s height to be between 190. cm 
and 192cm is roughly double the probability that it is between 190 cm and 
191cm.) A more precise statement is that one expects that 


1.10 
dx—0 Ox ( ) 


exists for a sufficiently well-behaved P(x, 22) (i.e. partial derivatives of 
P(x, x2) exist at x] = %2 = x). The function p(x) is called the probability 
density of the continuous random variable x. Note that this function cannot 
take negative values, because the probability is never negative. 


We shall often find it convenient to discuss the probability 6P that x lies in 
a small interval. We shall refer to this as the element of probability: oP is 
the probability that the random variable lies in the interval [x, x + dx] with 
a small width dx. We have 


6P = P(x,x + 6x) = p(x) dx + O(6z") (1.11) 


(where 6x is assumed to be positive). The approximation 06P ~ p(x) dx is 
valid for sufficiently small values of 6x (provided that P(x1, x2) is differen- 
tiable). The probability P(21,22) may be expressed as an integral of the 
probability density function: 


x2 
Pity =f dx p(x). (1.22) 
xy 

It is worth emphasising at this point that equation (1.12) can serve as a 
more general definition of the probability density function; it does not re- 
quire the assumption that P(x,,22) be differentiable, as was needed for 
equations (1.10) and (1.11). Of course, for a sufficiently well-behaved p(z), 
equation (1.11) follows directly and simply from equation (1.12), and Ex- 
ample 1.2, below, will show that equation (1.10) can follow directly from 
equation (1.12). For these reasons, equation (1.12) is usually taken to be 
the preferred definition of p(z). 


18 Chapter | Probability and statistics 


Because the continuous random variable x has to take some value between 
—oo and +00, we have 
P(=—co, 00) = lim lim” Pier a) =1, (T.43) 


Li—> —-C ®2—+00 


It follows from equation (1.12) that the probability density satisfies p(x) > 0 
and 


OO 
/ dx pla} = 4, (1.14) 
a7 OO 

A function must satisfy the normalisation condition (1.14) if it is to be a 


valid probability density. A function which satisfies condition (1.14) is said 
to be normalised. 


In many cases it is known that the random variable x lies in a range be- 
tween some numbers Ymin and Zmax, SO that p(x) vanishes for x < pin and for 
L > max. It follows that in this case equation (1.14) becomes 


de pe) = 1. (1.15) 


min 


Example 1|.2 


Show that equation (1.12) implies equation (1.10). [Hint: Consider the 
partial derivative of equation (1.12) with respect to x2, and assume it exists. ] 


Solution 


From the definition of the partial derivative, equation (1.10) can be rewritten 
as 


OP 
p(x1) = a) 


(1') 


T29=71 
(that is, we take the partial derivative with respect to v2, then set 22 = 7). 
Differentiating equation (1.12) gives 


Fash) = Bae fae ele) = elva) (2' 


Ox2 0x2 1 
then setting v2 = x; gives the same result as equation (1’). 
It can often be useful to think in terms of a graph of the probability den- 
sity p(x). Equation (1.12) shows that the probability that the random vari- 


able takes a value between x; and £2 is the area under the curve for p(x) 
between x; and 22 (see Figure 1.1). 


p (x) 


0 xX] x2 i 4 


Figure !.1_ ‘The probability that a random variable x takes a value between x; and 
x2 is equal to the integral of the probability density p(x) between these two limits 
(that is, the area of the shaded region under the curve) 


One very simple probability density is the uniform distribution, where the 
probability density p(a) for the random variable x is independent of z for all 


!.1 Probability 


x within some interval, and zero elsewhere. A uniform distribution for a ran- 
dom variable which lies between x; and x2 (where x2 > #1) has probability 
density 


constant, 21% 2 = Fa, 
p(x) = | (1.16) 


0, otherwise. 


Exercise |.7 


By considering the normalisation of the probability density equation (1.16), show 
that the value of the constant is 1/(a2 — 21). 


Exercise 1.8 


The pilot of an aircraft is expected to check the tyre pressures before each flight. 
The valve on one of the wheels is hidden behind other structures when it is rotated 
less than 20° either side of the vertical line from the axle, in which case the aircraft 
will have to be moved before the tyre pressure can be checked (see Figure 1.2). 


supporting structure 


tyre valve 


Figure 1.2 What is the probability that the tyre valve will be hidden when the 
aircraft is parked? 


What is the probability density for the angle of the tyre valve relative to the vertical? 
What is the probability that the aircraft will have to be moved to allow access to 
the valve? 


Exercise 1.9 


(a) The length of time that you have to wait for a bus is apparently random. The 
time t may be regarded as a random variable with a probability density p(t). 
Some plausible assumptions indicate that for t > 0, p is an exponential func- 
tion p(t) = Aexp(—t/to), where to is a typical waiting time. Show that the 
probability density is normalised if A = 1/to. 


(b) A regular passenger (with an understanding of probability) knows that to = 
15 minutes. He decides that if his bus takes longer than 45 minutes to appear, 
then the drivers have probably gone on strike, and he walks home. What is 
the probability that he will walk home if the buses are running normally? 


These assumptions are 
discussed in Exercise 1.32 
at the end of this chapter. 


20 | Chapter | Probability and statistics 


|.1.5 Two or more random variables 


It is often necessary to consider situations where two or more random vari- 
ables are measured or observed. The probability density for measurement 
of two or more random variables is defined in a similar way to that of a 
single random variable. For example, the height h and weight w of a person 
drawn at random from a large population can be regarded as two random 
variables. ‘The probability 6P that the height of a person lies between h and 
h + oh, and his or her weight lies between w and w + dw, is expected to be 
proportional to both of the small increments dh and dow. It may be written 
as a generalisation of equation (1.11): 


OP ~ p(h, w) oh ow, MAD) 
where p(h, w) is the joint probability density for height and weight. 


In general, given two random variables x and y, the element of probability 
OP that x lies in the interval [z,2 + 6a] and y lies in the interval |y, y + dy] 
(where 6x and dy are small) is written 


6P = ple, y) dx by. (1.18) 
The quantity p(x,y) is called the joint probability density function for x 
and y. 


Equation (1.12) connects the probability density with a probability for a sin- 
gle random variable. Let us consider the analogous relation for two random 
variables, which gives the probability P, that the point (x,y) lies inside a 
region A in the (x, y)-plane (see Figure 1.3). 


Xx Xx 


Figure 1.3 ‘The probability that a pair of random variables takes values lying 
inside a small rectangle of size dx x dy at (x,y) is p(x, y) da dy. The probability 
that the pair of random variables (x,y) lies in a region A is the integral of p over 
the region A. 


Summing the contributions from every element, and taking the limit as 
dx, dy — 0, we see that Py is expressed as an area integral of the density 
p(x, y) over the region A: 


Pas ff aedyp(v.). (1.19) 


Again, just as equation (1.12) can be regarded as the more fundamental 
definition of the probability density function p(x), equation (1.19) is to be 
viewed as the preferred definition of the joint probability density p(x, y). 


If the joint probability density factorises such that 
p(x, y) = pi (x) poly), (1.20) 


|.1 Probability 


where p;(x) and po(y) are the probability densities for single observations 
of each of the variables, then these random variables are said to be indepen- 
dent. This expression is analogous to equation (1.8) for the case of discrete 
probability distributions. The term ‘independent’ is being used in the same 
sense as for discrete random variables, because equation (1.20) implies that 
determining x has no influence on a measurement of y (and vice versa). 


The following is an example of a joint probability density for two indepen- 
dent random variables. The length of time that someone has to wait for 
their bus in the morning is tj. This is a random variable, with probability 
density p,(t1). The waiting time for the bus for the evening return journey 
is tg, with probability density p.(t2). The probability that the bus arrives 
after a waiting time between t; and ¢; + dt, is OP; = 0 1(t1) 6t1. It is very 
reasonable to assume that the waiting time in the morning has no relation to 
that in the evening, in that whatever the value of t;, the probability that the 
second wait is between to and ty + dto is O6P2 = po(tg) dt2. Equation (1.8) 
implies that the probability that the waiting time in the morning is be- 
tween ¢t, and t; + dt, while that in the evening is between tg and t2 + dta, 
a Py (t1) po(t2) Oty dts. It is also, by definition, or = p(ty, ta) oty oto, 
where p(t;,t2) is the joint probability density, so 


p(t1,t2) = (tr) po(t2), (1.21) 


in agreement with equation (1.20). 


Exercise 1.10 


In the situation described above, what is the joint probability density for the wait- 
ing times if t; and tz both have the exponential probability density considered in 
Exercise 1.9, with tg = 15 minutes? What is the probability of waiting less than 
15 minutes in the morning and less than 15 minutes in the evening? 


Exercise |.1 | 


In the situation described in the previous exercise, what is the probability of the 
total waiting time ¢t; + tg being less than 15 minutes? 


Exercise 1.12 


What is the condition for a joint probability density to be normalised? What other 
condition must a function satisfy if it is to be a valid joint probability density? 


Exercise 1.13 


Two random variables, x and y, have joint probability density p(x, y). Show that 
the probability density for the random variable x, which is here denoted by p,(2), 
is obtained from p(x, y) by integrating over y: 


CO 


p(x) = / dy p(x, y). (1.22) 


—oo 
Verify that this integral gives the correct result when x and y are independent 
variables. 


In most cases where we consider probabilities for pairs of random numbers, 
we shall be considering variables with a continuous distribution, described 
by a joint probability density. Occasionally we shall consider pairs of random 
variables which take discrete values. In this case we can define a joint prob- 
ability for the two variables. Consider the case where two random variables 
X and Y take, respectively, n and m possible discrete values, X; and Yj, la- 
belled by the integers i= 1,...,n and j =1,...,m, respectively. The joint 
probability P;; is the probability that X takes the value X; and Y takes the 


Harder exercise 


Harder exercise 


21 


22 Chapter | Probability and statistics 


value Y;. ‘The results considered above all generalise in a natural way to the 
discrete case. For example, the normalisation condition becomes 


i=1 j=1 
which is analogous to the solution of Exercise 1.12. Also, the formula to 
obtain the probability P; that X takes the value X; is 


ee (1.24) 
j=l 


which is analogous to equation (1.22). 


1.2 Statistics 


1.2.1 Statistics of a single random variable 


Probabilities and probability densities give the fullest possible description 
of random variables. An accurate determination of the probability density 
function of a random variable from repeated trials would require an enor- 
mous number of observations. For this reason, it is customary to describe 
a random variable by means of statzstics, which are single numbers, such as 
average values, describing some aspect of the range of values taken by the 
random variable. For many purposes, probability densities contain too much 
information to be directly useful, whereas statistics give a concise summary 
of what should be expected. 


The term statistic is popularly used for any piece of information in numerical 
form, for example the number of tons of steel produced last month in Ger- 
many. In our terminology such a number would be not a statistic, but a single 
observation. However, the average monthly steel production over a three-year 
period would be a statistic, because it is a single number describing a set of 
observations. 


The simplest (and best known) statistic is the average. Given a set of N 


numbers 71, %2,...,2j,...,£@N, the average is 
N 
1 
Lav = = ) . Le Lo 
i 


The average will depend upon the particular set of observations, but in the 
case where the number of observations N approaches infinity, we expect 
that the average will approach a limit. We now investigate the limit as 
N — oo, considering first the case where each observation x; can take only 
one of n discrete values, with the possible values of x; being labelled X;,, 
j =1,...,n, and with each value occurring Nj; times (with N = }7_, Nj). 
Then the average may be written as 


jee ay Xi. (1.26) 


| .2 Statistics 


Example 1.3 


Throwing a six-sided die five times gave the results 71 = 3, v2 = 6, 73 = 3, 
v4 = 6, x5 = 4, where x; is the number obtained on throw 7. Throwing the 
same die 25 times gave the sequence 


A GBs, igs ee 2 ae ee 2, 4 4, 4,8; 1,1, 1,2, 4,2, 1 
Calculate the average of each sequence. 


These sequences take a discrete set of values, with n = 6 possibilities. What 
are the values of X;? And what are the values of N; for the second sequence? 
Verify that for the second sequence, equation (1.26) gives the same answer 
as the average determined in the first part of this question. 


Solution 


The sum of the first sequence is 22, so the average is 22/5 = 4.4. The sum 
of the second sequence is 76, so the average is 76/25 = 3.04. 


The values of X; are X; =1,..., X; =Jj,..., Xg6 = 6, and their frequencies 
Brine. set of 25 throws ave 2. = 5, No = 6, Ng = 4,8 =, Ng = 1 eg = 3, 
SO 


S>XyNj = (1x 5) + (2x 6) + (8x 4) + (4x 6) + (5 1) + (6X3) 
= = 70, 


which gives the same value for the average. 


In the limit as N — oo, equation (1.26) may be written in terms of proba- 
bilities: 

Jim tay = dG in. = x P;X;. (1.27) 

fo) gut 

Except in some special cases where n is infinite, the average approaches a 
definite limit as N — oo. This limit is known as the mean value or expec- 
tation value, and is given the symbol (x). Angular brackets will be used to 
denote taking the limit of the average of the quantity inside the brackets, 
when the number of observations NV approaches infinity. The mean value is 
given by the expression 


ie) = 57 P/X;. (1.28) 
gan 


Exercise 1.14 


For the situation considered in Example 1.3 (throwing a six-sided die), what should 
be the limit of the average value as the number of throws is increased? 


This formula for the mean generalises to the case of a continuous random 
variable with probability density p(x). The generalisation can be derived 
by dividing the set of possible values of x into small intervals of length dz. 
Let us estimate the number of times, N;, that the variable x lies between 
x; = jou and x;41 = 2; + 6x within a total number of N observations. ‘This 
is approximately equal to N times the probability that x lies in this interval, 


23 


24 Chapter | Probability and statistics 


that is, N; ~ Np(x;) 6x. The mean value is 


(2) = Jim Lay 
oo Fe ay 
N-w~w 
j 
oS ieee (1.29) 
j 
Taking the limit as 6x — 0, the sum becomes an integral: At this point it may be 
- useful to review the 
ici / dx p(x) x. (1.30) discussion of definite 
_ integrals in Block 0. 


Exercise 1.15 


Calculate the mean value of a random variable x which has a probability density 
which is uniform on the interval |21, x9]. 


Exercise 1.16 


Calculate the mean of the exponential distribution introduced in Exercise 1.9, 
namely p(x) = exp(—x/2x9)/Xo for x > 0, and p(x) = 0 for x < 0. 


The mean value is not the only statistic which is of interest. The moments 
M;, of a random variable x are defined to be the mean values of x” (where k 
is a non-negative integer): in the case where x has a continuous distribution, 
the kth moment is 


M;, = (z*) = - dx x* p(x). (132) 


== OO 


Clearly, M, is the mean, and (recalling the normalisation condition (1.14)) 
Hig = 1. 


In some cases the integral (1.31) may diverge, and one may encounter a 
probability density function p(a) for which only one or two moments exist. 
(Clearly, Mp = 1 has to exist for p(x) to be a valid probability density 
function. ) 


Exercise 1.17 


Consider a uniformly distributed random variable x on the interval [0, 1]. Determine 
the moments MW;. 


Exercise 1.18 


Sometimes we may consider moments (x”) in the case where z has a discrete distri- 
bution. Write down the expression for the moment (x*) when the random variable 
takes n discrete values X;, with probabilities P;, 7 = 1,...,n. 


Determine (x?) where x is the number uppermost on throwing a die. 


Having defined the mean value of a random variable x, it is useful to give a 
statistic describing how wide is the variation of its values. An appropriate 
statistic is called the standard deviation, denoted by co. The square of a is 
called the variance, written Var(z), and it is defined as the mean value of 
the square of the difference Ax between the measured value and the mean 
value. Mathematically, these quantities are defined by the relations 


o” = Var(x) = (ae , =r LLce) 


| .2 Statistics 


The variance can be expressed in a form which is more convenient for cal- 
culation, in terms of the second moment. Expanding the square in equa- 
tion (1.32) and using equation (1.31) gives 


Var( oy :: dix (x? — 2(x)x + (x)*] p(x) 


fee CO CO 
= / dx x* p(x) — 2(2) | dx x p(x) + (ay? f at pie: (1.33) 
ata?) —00 —co 
The integrals in this expression are, respectively, the moments M2, M, = (2) 
and Mp = 1, so 
Var(x) = (a7) — (x)? = Mp — MZ. (1.34) 


This expression usually gives the simplest route to calculating the variance. 


Exercise 1.19 


Consider a uniformly distributed random variable x on the interval [0,1]. Show 


that the variance of x is Var(x) = #. 


1.2.2 Statistics involving two or more random 
variables 


When we have more than one random variable, it becomes harder to gain 
accurate information about probability density functions. Also, even if the 
probability density function is known, it is harder to interpret functions 
of several variables. Using statistics to convey information about random 
variables becomes especially useful when dealing with two or more random 
variables. 


Exercise 1.20 


Two random variables have a joint probability density p(x, y). Using the result of 
Exercise 1.13, show that the mean value of x is 


(a) se i dx [- dy x p(x, y). [i o 


Given a joint probability density p(x, y), the most general statistic we con- 
sider is (f(z,y)), where f(x,y) is any function of x and y. If we continue 
sampling the random variables, finding a sequence of pairs of values (2;, y;), 


i1=1,...,N, we can calculate the average 
i x 
fav = N 2a f(a). (1.36) 


The limiting value of this average of f(a, y) as the number of pairs N goes to 
infinity is the expectation value (f(z,y)). The formula for the expectation 
value of f(x,y) can be surmised: 


(ray)) = f ” / ae ee (1.37) 


This expression may be derived by dividing the (x, y)-plane into small rect- 
angular elements, each of size dx x dy. The probability of x and y being 


25 


Expectation is an 
alternative name for mean 
and both will be used 
interchangeably in this 
block. 


26 Chapter | Probability and statistics 


within the element centred at coordinates (x,y) is 6P ~ p(x, y) 6x dy. The 
mean value of f(x,y) is thus 


= ew ME = Loe att (1.38) 


where the sum runs over all the hoe siecep centred at coordinates 
(x,y). Taking the limit as dr — 0, dy — 0, the double sum becomes the 
double integral (1.37). | 


When dealing with two random variables, a useful concept is the correlation, 
which measures the extent to which they tend to be related. For example, 
the heights and weights of people are two different random variables, but it 
is expected that tall people are more likely to be of above average weight. 
These variables would be described as ‘correlated’. The tendency of two 
variables to be correlated is described by means of a correlation coefficient. 
The correlation coefficient of two random variables x and y is defined as the 
mean value of the product of the deviation of each random variable from 
its mean value. Expressed in terms of symbols, the correlation coefficient is 
given by 


Ca = (aaa, where Bo = 2 - 2), Ag =e @, (1.39) 


Note that the quantity C,, is positive if large values of x tend to occur at 
the same time as large values of y, and negative if large values of x tend to 
occur along with small values of y. 


The correlation coefficient may be expressed in a form which is more con- 
venient for calculation, as follows. We write the expectation value in terms 
of an integral using equation (1.37), and simplify the result using equa- 
tion (1.35) and a similar expression for (y): 


as -[ icf dy [ry — (uy) — (xy + (x)(y)] aly) 


= wf dc f dy x p(x,y) | 
Of dz f dy y p(x, y) + (2)(y) 


= (xy) — (x)(y). (1.40) 


Exercise 1.2] 


Show that C;,, = 0 if x and y are independent. 


Exercise 1.22 
Verify that Cy, = Var(z). 


The correlation coefficient is defined in the same way for discrete random 
variables. Consider the case where the random variables x and y take dis- 
crete values X;,i=1,...,n, and Y;, 7 =1,...,m, with the joint probability 
of observing X; and Y; being Fj ;. 


In this case, the expression for (xy) is 


Tr m 
ty) = >. Ss Pagal; 


i=1 j=1 


| .2 Statistics 


and the correlation coefficient is given by 


Coy = (zy) — (t)(y) 


Po > RG 2 ot ot x ae Bees (1.41) 


i=1 j=1 i=1 j=l i=1 j=1 


Exercise 1.23 


Determine (xy) where x and y are the numbers uppermost on two successive throws 
of a die. 


One further general result, which will be required in Chapter 2, concerns 
the sum X of M random variables 7;,2=1,...,M: 


M 
SS (1.42) 


In general, these random variables have a joint probability density function 
p(@1,%2,...,X2,), such that the element of probability 6P for finding the 
first variable between x; and x7; + 6271, the second between x2 and x2 + 0x9, 
etc., 1S 


6P = p(@1,%2,...,0m) 0X1 0x2 «++ OL. (1.43) 


The mean value of the sum of these random variables is given by a simple 
relation, namely the sum of the mean values of each variable: 


w=(ox) = > (x;). (1.44) 


| 


This is a very useful relationship. To understand how it is obtained, note 
that the mean value of each variable is given by a generalisation of equa- 
tion (1.35): 


ee? =f des f dx af dey 2; p(@ 1429p ~ 4 yE)- (1.45) 


Equation (1.44) can then be obtained as follows: 


x= fp dxf a dxy (41 +%2+---+2y) 


x ts 


M OO OO CO 
=> / dx | dx2 | Any Be Ol Fi, 6is5-.<geeae) 
i ies oe 


M 
=> (1.46) 
i=1 
In the case where the variables x; are independent, the joint probability 
density factorises as p(11,%2,...,£Mm) = P1(@1) Po(%2) +++ Pag(@ms), and the 
integrals in equation (1.46) factorise to give 


(Li) = iz day pia) f dx2 po(x2) af de p;(2i) 
=f des pne(oa) 


a iz (lies a0; e.{ we). (1.47) 


27 


28 Chapter | Probability and statistics 


The final equality uses the fact that all of the integrals except one are nor- 
malised, according to equation (1.14). Thus the mean of each variable is de- 
termined by a one-dimensional integral, which simplifies the calculation con- 
siderably. Actually, this property holds quite generally, for non-independent 
as well as independent variables, but the reason why it also holds for non- 
independent variables is more complicated and will not be explored further. 
In any case, this more general situation will not arise elsewhere in this block; 
only independent variables will be encountered. 


The evaluation of the mean value of the sum by performing multiple integrals 
is impractical in most cases. However, we have seen that this quantity can 
be calculated very easily if the mean values of each component are known. 
This illustrates a general principle, that when dealing with multiple random 
variables it is usually more efficient to work in terms of statistics rather than 
probability densities. 


1.3 The normal distribution 


In Block 0 we considered functions of the form 
f(z) = Acxp|-o@ = a5, (1.48) 


where A, zg and a are constants with a > 0, which are known as Gaus- 
sian functions. A random variable having a probability density which is a 
Gaussian function (with the coefficients A, rp and a real) is said to have a 


Gaussian distribution, or a normal distribution. The terms ‘Gaussian’ and 


ac - as ' ‘normal’ in this context are 
Many quantities are found to have probability densities which are well ap- used interchangeably. 


proximated by the normal distribution. Figure 1.4 is a plot of the proba- 
bility density for weights (in kilograms) of packages containing six apples, 
displayed as a histogram. It is compared with a smooth curve, which is a 
good match to the data. The probability density plotted as the smooth curve 
is a normal distribution, and it clearly provides a very good description of 
the data. 


p(W) 


20.0 


10.0 


0.0 ies 
0.5 0.6 0.7 


Figure 1.4 A histogram showing the distribution of weights of 2000 packages each 
containing six apples. The continuous curve is a normal distribution, for 
comparison. (For the simulated data used to create this example, the parameters 
occurring in equation (1.48) are a = 1600, 2p = 0.6, A = 40/,/7.) 


1.3 The normal distribution 


You may not be familiar with histograms, so let us pause to discuss the con- 
struction of Figure 1.4. The range of possible weights is divided into small 
intervals (in this case, 40 intervals of width 0.005), and the number of weights 
in each interval is recorded. The plot consists of rectangular bars. ‘The width 
of each bar indicates the extent of each corresponding interval, and the height 
of the bar is proportional to the number of weights NV; recorded in that interval. 
This is the standard way of presenting empirical information about a proba- 
bility distribution. The vertical scale has been chosen so that the total area 
of all of the rectangles is unity, so that the histogram approximates the nor- 
malised probability density. The heights of the columns therefore approximate 
the probability density function, p(x), which satisfies equation (1.14). 


As the name suggests, normal distributions occur very frequently. ‘The cor- 
respondence clearly cannot be exact in our example because the weight of 
a bag of apples cannot be negative, whereas the function (1.48) remains 
positive when zx is negative. Note, however, that in this case, as can be seen 
from Figure 1.4, the probability density function modelled by the Gaussian 
function, f(W), takes extremely small values when W < 0. In fact, the 
normal distribution is often a very good approximation. 


Calculations involving normal distributions often require their moments, 
which can be obtained from integrals of the form 


CO 

i= / dx x” exp(—2’). (1.49) 
=O 

(We shall confine our attention to the case when n > 0 is an even integer; 


when n is odd, the integral is zero because the integrand is an odd function.) 
The first of these integrals was discussed in Block 0: 


lo = [- dx exp(—x*) = /n. (1.50) 


= CO 


The value of [2 is obtained in the following exercise. 


Exercise 1.24 


Show that Iz = 7/2. (Hint: Use integration by parts.| You may assume that 
x” exp(—x*) — 0 as |2| > oo. 


Using integration by parts, it can be shown that the integrals J, satisfy the 
recursion relation 

n—1 
Exercise 1.25 


Verify relation (1.51). 


Exercise |.26 


Show that if the Gaussian probability density function (1.48) is normalised (satis- 
fying equation (1.14) with f replacing p), A and a are related by A = \/a/r. 


29 


The reason for the ubiquity 
of normal distributions will 
be discussed in Chapter 5, 
on the central limit 
theorem. 


30 Chapter | Probability and statistics 


Exercise 1.27 


Show that the mean and variance of the normal distribution are (x) = x and 
o* = 1/2a, respectively. 


Using the integrals obtained above, it can be verified that if o is the stan- 
dard deviation of a normal distribution with mean value xg, the probability 
density function may be written in the form 


= = exp|=(a Herat (1.52) 


This is a very useful standard form for the normal probability density func- 
tion. 


Example |.4 


Verify that equation (1.52) does indeed correctly describe the normal distri- 
bution with mean xo and standard deviation o. 


Solution 


Equation (1.52) is of the same form as equation (1.48), with a = 1/20° 
and A = 1/\/270. Using the result of Exercise 1.26, the distribution is nor- 
malised because the factors A and a are related by A= \/a/z. Also, the 
results of Exercise 1.27 show that the mean value is (x) = xo, and the vari- 
ance is 1/2a = a7. The density (1.52) does therefore represent a normalised 
distribution with mean xp and standard deviation co. Hf 


Exercise |.28 


If x has a Gaussian probability density, show that ((x — x9)*) = 30%. 


1.3.1 The error function 


We have already seen that probabilities are determined by integrals of prob- 
ability density functions using expressions such as (1.12). The integral of 
the Gaussian function is therefore a very significant object in probability 
theory. This integral cannot be expressed exactly in terms of elementary 
functions, and is accordingly one of a set of functions known as the special 
functions. The error function is a definite integral of a Gaussian function, 
written erf(x), defined by 


erf(x =f dy exp(—y’). [i3s) 


The interpretation of the error function is as follows. The probability of 
a normally distributed variable differing from the mean by less than Aq is 
erf(A/\/2). This is demonstrated by the following calculation: 


Lo+tAT 
/ dx exp|—(x — x9)" /207] 


g—Ao 


P(xg — Ao, 29 + Ao) = 


/ 2Ne 


1 Ao 
a ee f du exp[—u? /207] 


/ 270 


d/ V2 eT 
=a], orc y*) = erf(A/V2), ( 1:54) 


1.3 The normal distribution 


where we used the changes of variables u = x — x9 and y = u//2o, and the 
final step uses the fact that exp(—y7) is an even function. The error function 
is plotted in Figure 1.5 for x > 0. 


erf(x) 
| 


0.8 
0.6 
0.4 


0.2 
OS 2 ee 


Figure 1.5 ‘The error function erf(a) for x > 0 


You may also see expressions containing the complementary error function, 


erfc(x), defined by erfc(x) = 1 — erf(zr). 


Another notation for an integral of a Gaussian function is 


Ar 
N(Q) = 7 J eyexv(-v/2), (1.55) 


which is sometimes called the normal distribution function. The integral 
N(X) is the probability that a Gaussian random variable is less than 
ro + Ao (where 29 and o are, respectively, the mean and standard deviation). 
Note that N(oo) = 1, as required since the probability that a Gaussian ran- 
dom variable is less than oo is obviously one. Figure 1.6 is a graph of N(A). 
Comparison with equation (1.54) shows that N(A) = § erf(A/V2) + 5. This 
can be seen by first noting that the substitution y = u/\V2 in the final 
integral in the derivation (1.54) implies that erf(A/V/2) = N(A) — N(-A). 
Secondly, as will be seen from the hint in Exercise 1.29 below, we have 
N(—A) =1-—  N(A). Combining these two results gives the required expres- 
sion. 


Figure 1.6 The function N(A) is the probability that a normally distributed 
variable is less than 29 + Ao (where 2p and o are the mean and standard 
deviation) 


31 


32 Chapter | 


Both the function N(A) and the error function will be used in later chapters. 
Values of N(A) are tabulated below. 


Table [.2 


3.167 <1" 
0.001 350 
0.006 210 
0.022 76 


0.066 81 
0.158 7 
0.308 5 
0.500 0 


Exercise 1.29 
Calculate N(1) and N(2) using the data in the table above. [Hint: Note that 


N(-2) = val dy exp(—y"/2) 


du exp( —u* /2) =1— N(x), 
=f duesp(—u*/2) =1- NEw) 
where the change of variable y = —u was used in the second integral, and the fact 


that N(oo) = 1 was used in the last equality. | 


Exercise 1.30 


What is the probability that a Gaussian random variable does not exceed its mean 
value by 2.5 multiples of its standard deviation? What is the probability that 
a Gaussian random variable does not differ from its mean value by more than 3 
multiples of its standard deviation? 


1.4 Outcomes 


After studying this chapter you should be able to: 

e understand the concept of probability for discrete events; 

e determine probabilities for combinations of mutually exclusive events; 

e determine probabilities for successive independent trials; 

e understand the notion of a continuous random variable and its descrip- 
tion using the probability density function; 

e understand the joint probability of many discrete or continuous random 
variables; 

e understand the joint probability density function for two or more con- 
tinuous random variables; 

e calculate the average of observations of a single random variable, and 
understand its relationship to the mean or expectation value of that 


Probability and statistics 


|.5 Further Exercises 


random variable; 

e calculate moments, the variance and the standard deviation of a random 
variable; 

e calculate the correlation coefficient of two random variables; 

e understand the normal (or Gaussian) distribution and its probability 
density function, as well as the error function and normal distribution 
function; 


e calculate moments of Gaussian random variables. 


1.5 Further Exercises 


The following exercises are harder than those in the main text. The results 
are not used in subsequent chapters, but they will broaden and deepen your 
knowledge of probability. 


Exercise 1.31 


If every couple were to continue having children until they have one boy, what 
would be the ratio of males to females in the next generation? What would be the 
mean number of children per couple? 


Assume that everyone enters into one relationship that produces offspring, and 
that all couples are fertile and can produce an arbitrary number of offspring. Also, 
ignore the existence of twins, triplets, etc.; i.e. assume that all pregnancies produce 
just one child at a time. 


(Hints: Using the obvious notation B = boy, G = girl, the probability of producing 


one child (B) is 3, and the probability of producing two children (G, B) is +. The 


probability of producing n +1 children (nG, B) is 2-("+1)_ The mean number of 
girls per couple is 


= sas ott 
in) = ong 
n=1 
To find this sum, introduce the geometric series (see Block 0, equation (1.2)) 


ore a ao! 1 
i al ices ee a 
n=] 


and note that the derivative of this sum is in the form of the series required: 


CO 


d n 
—7,5(2) a iz enti 


Hat 


Setting « = 2, you should conclude that the gender balance and the size of the 
population are maintained.| 


These exercises are 
optional. 


33 


34 Chapter | Probability and statistics 


Exercise 1.32 


Often we might be interested in a series of random events which occur with the 
same average frequency over any long period of time. Examples are the occurrences 
of earthquakes, and the series of clicks from a Geiger counter observing radioactive 
decay. We would like to know the distribution of times between these events. In 
the case where the events are independent of each other, this distribution can be 
calculated by the following approach. 


The probability that an event occurs in a very short interval of time, dt, is assumed 
to be Rot, where R is known as the rate constant. Let P(t) be the probability that 
no event will have occurred after time t. Because the events are independent, the 
probability that no event will have occurred after time t + dt is P(t) multiplied by 
the probability of a small time interval dt passing with no event, namely 1 — Rot. 
It follows that 


P(t + 6t) = P(t) (1 — Rot). 
(a) Deduce a differential equation for P(t), and show that 
P(t) = exp(—Rt). 
This is known as the Poisson distribution. 
Show that the mean time between events is (t) = 1/R. 


(b) A cyclist travels 20km per day, and notices that she suffers 15 punctures in 
100 days. Estimate the probability that she will complete a 600km cycling 
holiday without a puncture. 


Solutions to Exercises in Chapter | 


Solutions to Exercises in Chapter | 


Solution I.| 


No solution is given, as your results will differ in detail from anyone else’s. 


Solution 1.2 


The number of picture cards is 12 (a King, a Queen and a Jack in each of the 4 
suits). The probability of drawing any one of 52 cards is 1/52. The probability of 
1 


drawing any picture card is 12 x 7 = 3/18. 


Solution 1.3 


The total number of deaths was 575200. The table enables us to estimate the 
probability of death at ages 25-44 as P2544 = 14600/575 200 ~ 0.0254. Similarly, 
P4564 = 88 200/575 200 ~ 0.1533. The events corresponding to death occurring 
in either of these age ranges are mutually exclusive, so the probability of death 
occurring in the age range 25-64 is the sum of these values: 


Po5-64 & 0.0254 + 0.1533 ~ 0.179. 


(Alternatively, the total number of deaths at ages between 25 and 64 is N25-64 = 
Nos5_aa + N4s-64 = 14600 + 88 200 = 102800. Hence P2564 = 102 800/575 200 ~ 
0.179.) 


This is a good estimate of the probability that a death occurring in the next year 
will occur between ages 25 and 64. Factors influencing longevity (such as medical 
advances and wars) change over time, so this may not be a good estimate that 
someone born this year will die between ages 25 and 64. 


Solution 1.4 


On throwing a single die, the probability of obtaining a 1 or a 2 is Pjor2 = ° =~ 


The probability of throwing two dice with each showing less than three is P = 


oe ie 
a3 = 3° 


wR 


As a check, consider the possible combinations of two throws. There are 6 x 6 = 36 
possible combinations. Of these, only the following four have no number greater 
than two: (1,1), (1,2), (2,1), (2,2). The required probability is <6 as ot confirming 
the result obtained from equation (1.8). 


Solution 1.5 


The weather is unrelated to the convention determining which days of the week 
are available for leisure, so the formula for independent events (equation (1.8)) 
is applicable. The probability of a day falling on a weekend is 2/7. Thus the 
probability of a day falling on a weekend and also being suitable for the sport is 
A/10 x 277 = SE S114. 


The number of days available for this sport every year is therefore approximately 
365 x 0.114, that is, approximately 42 days per year. 


Solution 1.6 


A total of 7 can be obtained in 6 ways: 1+6, 2+5,3+4,4+3,5+2,6+1. 
There are 6 x 6 = 36 possible outcomes from throwing the two dice. Since each 
possible outcome has the same probability, each combination of numbers has prob- 
ability 1/36. There are six different outcomes for which the total is 7, and these 
are mutually exclusive, so the probabilities add. The probability of the total being 
equal to 7 is therefore 6/36 = 1/6. 


OTT: Taal 


36 Chapter | Probability and statistics 


Solution 1.7 


Let the constant in equation (1.16) be denoted by C. The normalisation condi- 
tion (1.14) is 


/ ax p(v) = | dci= Clee =o) = 1, 


—oo wy 


ot = 1/(x2 = 21). 


Solution 1.8 


The angle at which the tyre valve comes to rest when the aircraft is parked is 
random. Since the wheel is circular, no direction is preferred, and the angle @ with 
respect to the vertical should have a uniform distribution p(@) = constant = 1 /360, 
in the range —180° < 6 < 180°. 


The probability that the angle is between 6; = —20° and 62 = +20° is 


0 +20 

2 1 age? 4 
p=} ge Pe PALS 
| (9) mal. 360 9 


1 
The probability that the aircraft will have to be moved to check the tyre pressure 
is therefore nt 


Solution 1.9 


(a) Since p(t) = 0 for t < 0, the normalisation condition (1.14) is 
The substitution u = t/to 


i= 4 dt p(t) = A| dt exp(—t/to) = Ato. was used to evaluate the 
0 0 integral. 


The probability density is therefore correctly normalised by setting A = 1/to. 


(b) The probability P of waiting longer than a time T is the area under the graph 
of p(t) between T’ and ov: 


1 CO 
P= al dt exp(—t/to) = exp(—T’/to). 
to JT 
The probability of waiting longer than JT’ = 45 minutes is therefore 
P = exp(—45/15) = exp(—3) ~ 0.049. 
There is roughly a one in twenty chance that the passenger will walk home if 
the buses are running normally. 
Solution 1.10 


The joint probability density is 


1 il exp|—(t1 + t2)/To 
p(ti,t2) = — exp(—ti/to) x — exp(—t2/to) = saa ald 
to to to 
where to = 15 minutes. The probability P that both waiting times are less than to 
is obtained by integrating over the square domain 0 < ft; < to, 0 < tz < to: 


to to 
r =f at, | dto p(ty, ta) 
0 0 
ee iy 


to 
= [dt exp(—ti/to) / Pe ee 
0 JO 0 
SS a 
= — / dt, exp(—t1/to)[1 — exp(—1)] The substitution u = t/to 
to Jo was used to simplify each 
= [1 — exp(—1)]?. integral. 


Solutions to Exercises in Chapter | 


Solution I.11 


The joint probability density is the same as in Exercise 1.10, and in this case the 
region of integration is a triangle in the (t,,t2)-plane where t; > 0, t2 > 0 and 
ty +tg < to (see Figure 1.7). 


Figure 1.7 The region of integration considered in Exercise 1.11 
The probability P that t; + tg < to is given by 
1 sto ity 
r= al dt, exp(-ti/to) f dts exp(—t2/to) 
0/0 0 
a ia 1—t,/t 
= = | dt, exp(—t1/to) |— exp(—u)|, ee 
0 JO 


= | dv exp(—v)[1 — exp(—1 + v)| 
= 1 — 2exp(-—1). 


Solution 1.12 


The probability of the random variables x and y taking any values in the (2, y)- 
plane is unity. Setting the probability in equation (1.19) equal to unity when A is 
the entire (x, y)-plane gives the normalisation condition 


/ a | dy p(z,y) = 1. 


Also, equation (1.18) implies that the joint probability density can never be nega- 
tive. 


Solution 1.13 


Let 5P be the probability that the first variable is between x and x + dx. Also, 
let 7 be an integer, and let dP; be the probability that both variables are in the 
respective small intervals [x, x + dz], [7 dy, (j +1) dy]. Note that 6P ~ p,(x) dx and 
bP; ~ p(x, j dy) dx dy, and that 6P = )/, 6P;, so 


p(a)da~ > pla, 7 dy) baby. 


j=—oo 


In the limit as dy — 0, the sum becomes an integral, so the probability density for 
x alone is the integral of the joint probability density over the other variable: 


pi(z) = Ee dy p(x, y). 


—oo 
When z and y are independent variables, we have p(x, y) = p)(x)po2(y), where p, (x) 
and p(y) are the probability densities of x and y, respectively. In this case, the 
integral gives 


p(x) = | ee ee 
= pr(e) | Se et 


as expected. 


37 


Using substitutions 
U = to/to 
v= t,/to. 


The final step used the fact 
that p(y) is normalised, 
satisfying equation (1.14). 


38 Chapter | Probability and statistics 


Solution 1.14 


For throwing a single die, there are 6 possible outcomes, where the number on top 
takes the values 7 = 1 to j = 6, all with probability P; = 1/6. The mean value of 
g= fis 
6 6 
1 L424 4+443-6 "2 
= [a ee fe ee ee ee eS 
ee ae 6 6 
bo 3! gk 

Note that the average of the second sequence in Example 1.3 is much closer to this 
mean value than is the average of the first (shorter) sequence. 


Solution 1.15 


Assuming that 72 > 21, the probability density is 1/(a2 — 71) in the interval [x1, x2], 
and zero outside it (see Exercise 1.7). Then integral (1.30) becomes 


1 = 
9 / dex 
pe A 


= 1 2) 72 
2 2 
~ 2(x_ — 1) eciila 


Solution 1.16 


Remembering that the density p(x) is zero for x < 0, the mean value is 


(x) = — dx x exp(—2x/20) 
Xo Jo 
= £5 / duu exp(—u) The calculation used the 
0 change of variable 
00 ne u = x/xo, followed by an 
oe [—wexp(—u)|, + vo | duexp(—w) integration by parts. 


= (x9 x 0) + (x0 x 1}, 


a6 (x) = ap. 


Solution |.17 
The probability density function is 


yey {ts 2 


otherwise. 


The moments are therefore 


OO 1 
1 
Mi = | dex‘ p(x) = | on! =e 


Oe 


Solution 1.18 


In the case of a discrete distribution, by analogy with equation (1.28) the expression 
for (x*) is 


(xP) = + AP, 
j=l 


If x is the number obtained from a throw of a die, this takes discrete values from 
j=1tojz =6, with P; = rt So the mean value of 2? is 


6 
Ll -4+ 94 16 + 20+ a0 9 9! ; 
2\ — S 4 Ss ee eee ee S15 TE 


Solutions to Exercises in Chapter | 


Solution 1.19 


From the result of Exercise 1.17, the first and second moments are M; = 5, Mz = 5. 


The variance is therefore Var(z) = Mz — M? = +. 


Solution 1.20 


The expression for the mean value of x in terms of its probability density p,(a) is 


(x)= [deep 


=O) 


using the notation from Exercise 1.13. The probability density p,(a#) is given in 
terms of p(x, y) in the solution to Exercise 1.13. Inserting this expression gives 


in) = ‘s ae [ dy x p(x, y). 


Solution |.21 


If z and y are independent, their joint probability density factorises, and may be 
written as p(x, y) = p,(x)po(y) where p, and ps are the probability densities for x 
and y, respectively. The mean value of the product xy is then 


(cy) = / dx / dy xy p(x, y) 


=f aexp(a) | dyverty) 
= (x)(y). 


So the correlation coefficient is C,y = (ay) — (x)(y) = 0. 


Solution |.22 
Comparing equations (1.39) and (1.32), we have 
Con = (a = (2))(a — (a))) = Var(a’. 
Alternatively, comparing equations (1.40) and (1.34), we have Cy2 = Mz — M7? = 
Var(2). 
Solution 1.23 


For two successive throws, yielding values 7 on the first throw and 7 on the second, 
all combinations of two numbers have equal probability, namely P;,; = 1/36. The 
mean value of the product ry = 7% x 7 1s 


6 6 
(ay) =D Pig ix J = 35D DIXS 
9 


Solution 1.24 


Using the substitutions u = x and duv/dx = xexp(—2”) (so v = — exp(—2?)/2) in 
the formula for integration by parts, 


dv du 
[aus me juv| - fac aa 


gives 


b= [dea exp(-0%) = eee)" [oa exp(=2?) 


=O) 


39 


ag Chapter | Probability and statistics 


Solution 1.25 


Using the substitutions u = x”~! and dv/dx = xexp(—2”) in the formula for inte- 
gration by parts gives 


n= | dx x” exp(—2*) 


oo 


n—1 ee oo oe 
= zs ay 
— =e +. . 5 / dx gh—2 exp(—x”) 


n—-l 
—— [,,_ ° 
9 2 


Solution |.26 


If the probability density is normalised, we have 


i dx Aexp|—a(x — 29)?] = 1. 


=o 


Using the change of variable u = \/a(a — 49), so that du/dx = \/a, we obtain 


Al” dx exp[—a(x — x9)*] = 5; a duexp(—u”) = A = 


= Co 


Thus we must have A = \/a/z in order for the probability density function to be 
normalised. 


Solution 1.27 


To calculate the mean value, use the substitution u = a(x — xo): 


(e) = | dexp(a) 


2 Jef iii 
Pe t/.= = = ( +2] exp(—u”) 


- | duuexp(—u*) + = ar du exp(—u?) 
a: 
= 2 — 


= LQ. 
The same change of variable helps to calculate the variance: 


a” = ((x — (z))*) 
= ((e-20)?) 


“Vel dn (a ~ 20)? expla ~ 0) 


ee 


using Ig as calculated in Exercise 1.24. 


The first integral in the 
fourth line vanishes 
because the integrand is 


odd. 


Solutions to Exercises in Chapter | 


Solution 1.28 
Using the substitution u = (x — xq)/V 20? and relation (1.51), we have 


(ae —ag)*).= z dx ee (x — 29)* exp[—(x — to)" /207| 
4o* ere) ‘i 


Solution 1.29 


From the hint we have N(x) = 1— N(—«). Using the data in the table gives N(1) = 
| — N(—1) =1—0.1587 = 0.8413. Similarly, N(2) = 1 — 0.02276 = 0.977 24. 


Solution 1.30 


It follows from the description of the function N(x) given beneath equation (1.55) 
that the probability that a Gaussian random variable does not exceed its mean by 
more than 2.5 multiples of its standard deviation is N(2.5) = 1— N(—2.5). From 
the table, this gives N(2.5) = 1 — 0.006 21 = 0.993 79. 


One can see from equation (1.55), and the discussion following it, that the probabil- 
ity that a Gaussian random variable has its value within three standard deviations 
of its mean is given by N(3) — N(—3) = 1 —2N(—3) = 1-2 x 0.001 35 = 0.9973. 


Solution 1.31 


The probability of producing a boy is 5, and the probability of producing a girl 
is 5. The probability of producing n girls and 1 boy is (Gr xe (2). ‘The 


mean number of girls is 
(ny = =. 2Pin) = Son cr ; 
n=l 1 


Now note that this sum is closely related to the derivative of S(xz) given in the 
question: 


Setting x = 2 gives 
1 1 
mn) 2. mt = 1p 


The mean number of girls per couple is (n) = 1, and there is always one boy, so the 
ratio of male to female births is unity. 


Because the mean number of children produced by each couple is two (1 boy and 
(n) = 1 girls), the population size is expected to remain constant. 


AI 


42 


Chapter | 


Solution 1.32 


(a) 


Writing 
P(t + 6t) = P(t) + P’(t) 6t + O(6#7) 


and comparing with the expression for P(t + 6t) given in the question, we see 
that 


= ee 
Integrating this for P > 0 gives 
InP=-—Rt+c, or equivalently P(t) = Cexp(—Rbt), 


where c and C' are constants. The constant C’ is determined by noting that at 
t = 0 it is certain that no events have occurred, so P(0) = 1, giving C = 1. 


The mean time between events is 
it) = [det plt), 
0 


where p(t) is the probability density for events to be separated by time t. (Note 
that p(t) is zero for negative t, so the lower limit of the integral may be set to 
zero.) The following argument can be used to determine p(t). 


First, let P(t,,t2) be the probability of observing an event between times 
t; and tg (assuming te > t; > 0). Note that we have P(t,,t3) = P(ti,te) + 
P(tg,t3) (with tz > tg >t,). Now, since P(t) + P(0,t) = 1, we have already 
determined that P(0,t) = 1 — exp(—Rt), so we have 
P(ti,t2) = P(0,t2) — P(O,t1) 
= [1 a exp(—Rt2)| = [1 = exp(—Rt,)| 
= exp(—Rt1) (1 — exp[—R(t2 — t1)]). 
It follows from the definitions of p(t) and P(t, t2) that P(t,t + dt) = p(t) dt + 
O(6t?), so 
P(t,t + 6t) 
ot 


The mean time between events is therefore 


= ae | 
‘t= r | ditexp(—Rt) # 
0 


| = Rexp(—Rt) 


Rr 
The rate at which a puncture occurs is R = 15/100 per day. It takes 30 days 


for the cyclist to travel 600 km given that she cycles 20 km per day. Hence, the 
probability of not getting a puncture after cycling 600 km is 


P(30) = exp(—30R) = exp(—30 x 15/100) = exp(—4.5) ~ 0.011. 


So the cyclist is very likely to have punctures. 


Probability and statistics 


CHAPTER 2 


Discrete random functions and 
random walks 


2.1 Introduction 


The Introduction to this Block listed some processes and phenomena, illus- 
trated in Figures 0.1 to 0.7, which can be modelled by random functions. 
This chapter discusses some discrete models for these random functions, 
f(n), which are defined only for integer values of n. For each value of the 
integer n, the value of the function f(n) is defined by means of a random 
process (that is, by a sequence of operations which involves random ele- 
ments, such as tossing coins, or using computer programs which generate 
apparently random numbers). Later, in Chapter 6, there will be a discussion 
of how these definitions are extended so that one type of random function, 
the random walk, can be defined for all real values of its argument. 


The concept of a random function is just an extension of the idea of a ran- 
dom number. Random numbers are generated by using a random process to 
select a number from a specified set (such as the set of real numbers, or the 
set of positive integers). The random process gives a certain probability for 
selecting a particular value for the random number (or it gives a probability 
density, if the numbers are selected from a continuous set). Random func- 
tions are constructed in the same manner: a random process has a certain 
probability (or probability density) for selecting a function from a given set. 


In general, it is a difficult task to define sets of functions, and to associate 
functions in these sets with probabilities. The approach that we shall use 
in this chapter is to define one very elementary type of random function, 
which we shall call the coin-tossing function. The coin-tossing function is 
then used to derive two other types of random function, called the random 
walk and the correlated random function, which find applications as models 
of random processes. 


The chapter starts by introducing the coin-tossing function in Section 2.2, 
and this will be used (in Section 2.3) to construct random walks. The 
random walk is a key concept in this course because of its relation to diffusion 
processes, and its properties are examined in some detail in Sections 2.4 
to 2.7. Finally, Section 2.8 describes the correlated random function. 


44 Chapter 2 Discrete random functions and random walks 


2.2 An elementary random function 


2.2.1 Defining the coin-tossing function 


We start by considering the simplest random function, which we call the 

coin-tossing function. It is a mapping from a set of consecutive positive The name ‘coin-tossing 
and negative integers {—N,...,—2,—1,0,1,2,3,...,N—1,N} to the set function’ is not standard 
{—1,1}, constructed by the following procedure. For each value of n chosen mathematical terminology. 
from the first set, toss a coin. If it falls heads up, assign the value +1 to the 

function at n; otherwise, assign f(n) = —1. Then pick the next integer n, 

and repeat for all the elements of the first set. (We could pick the values of n 

in sequence, but the order in which we take them is, in fact, irrelevant.) By 

increasing N, this procedure will define the function f(n) for an arbitrarily 

large range of integers. 


The function will depend upon how the coin falls with each throw, and re- 

peating the entire procedure will produce different examples of the random 

function. Sometimes these are described as different realisations. ‘T’wo ex- 

amples (i.e. different realisations) are shown in Figure 2.1. The procedure 

which generates a realisation of the random function is called a stochastic The word ‘stochastic’ has 
process. We shall use the term ‘random function’ to refer to both the pro- the same meaning as 
cess that generates the function and one of its realisations. For example, ‘random’. 

when we refer to the coin-tossing function, it will be clear from the context 

whether we mean a particular realisation of this random function or the 

stochastic process which generates such realisations. 


Figure 2.1 Two realisations of the coin-tossing function, defined for the range 
n==s ton = +5 


The coin-tossing function will be used to generate other types of random 
function. Unless otherwise stated, we shall assume that the limit as NV — oo 
is taken, so that f(n) is defined (randomly) for every n. 


2.2 An elementary random function 


2.2.2 Statistics of the coin-tossing function 


In Chapter 1 we discussed random events in terms of their probabilities, 
and of course we can discuss random functions using the same approach. 
For example, given a random function (such as the coin-tossing function) 
that takes a discrete set of values f(n) on a finite set of points (such as 
n=-—WN,...,0,1,...,N), we can assign a probability for a given realisation 
of the random function. (In cases where f(n) has a continuous range of 
values, a probability density must be specified.) However, this approach is 
very cumbersome, because of the large number of variables required. (For 
example, the coin-tossing function defined for the integers from —N to +N 
has a probability which is defined in terms of 2N +1 random variables, 
namely the values f(n).) It is preferable to discuss random functions in 
terms of their statistics. The remainder of this section will describe the 
statistical properties of the coin-tossing function. 


Simple statistics give a clearer understanding of the properties of a random 
function than can be obtained by considering probabilities. Also, we shall 
see later that statistics can often be evaluated without writing down proba- 
bilities or, for continuous cases, probability density functions. The simplest 
statistics describing a random function f(n) are the mean (f(n)) and the 
second moment (f7(n)), discussed in Chapter 1, Section 1.2. 


We use angular brackets to denote expectation values in the same way as in 
Chapter 1, but it may be helpful to review what the notation (f(n)) means. 
If we make M observations of a random variable X, then the average of 
these observations approaches (X) in the limit as MM — oo. The quantities 
(f(n)) and (f7(n)) are defined in the same way. If we generate M different 
realisations of a random function f, then we can determine the average of 
f for some given choice of the integer argument n, that is, the average of 
f(n). This average approaches the expectation value (f(n)) in the limit 
as M — oo. Similarly, if we calculate the average of f* at some given 
value of n over M realisations of the random function, then this average 
approaches the expectation value (f?(n)) in the limit as M — oo. There 
is no fundamental difference between calculating the expectation value of 
a single random variable, (X) say, and the expectation value of a random 
function with the argument set equal to n, namely (f(n)). In both cases, 
equation (1.28) or (1.30) can be used, depending on whether the quantity 
that we are averaging has a discrete or a continuous range of values. 


In general, both of these expectation values, (f(n)) and (f7(n)), will depend 
upon n. In the case of the coin-tossing function, however, (f(n)) and (f7(n)) 
do not depend upon n, because the procedure used to generate the random 
number f(n) is the same for all choices of n. 


There is an alternative approach to defining expectation values for random 
functions. Given a particular realisation of a random function f(n) which, 
in this case, is defined on all positive integer values n, we could calculate an 
average over n, such as 


fav = yf) (2.1) 


and consider the limit of this quantity as N — oo. This average, based on a 
single realisation, need not be the same as (f(n)), which is defined in terms of 
an average over realisations. All of our discussions will consider averages over 
different realisations. 


45 


46 Chapter 2 Discrete random functions and random walks 


The correlation coefficients (f(n1)f(m2)) — (f(m1))(f(m2)) are, in general, 
dependent upon n; and ng. The set of these correlation coefficients is re- 
garded as being a function of n; and ng, called the correlation function 


C'(n1, 2): 
C'(ny,n2) = (f (ni) f(ma)) — (f(n1))(F (m2). 


The correlation function is positive if the values of f(n2) — (f(n2)) tend to 
have the same sign as f(n1) — (f(n1)), and negative if they tend to have 
opposite signs. The correlation function is zero if the values of f(n1) and 
f(n2) are independent. When we investigate the statistics of the random 
walk in Section 2.4, it will become clear that it is very useful to know the 
correlation function of a random function. 


(2.2) 


The task of calculating mean values and correlation functions is greatly 
simplified if all of the values f(n) are independent, as is the case for the 
coin-tossing function. These statistics will now be calculated for the coin- 
tossing function f(n), which takes values +1. Recall that the mean value 
of a random variable is the sum of the possible values of the variable, with 
each term in the sum multiplied by the probability that this value occurs. 
In this case, the values +1 occur with probability P(+1) = s, so the mean 
value of the coin-tossing function is 


(f(m)) = (+1) P(+1) + (-1) P(-1) = 5-5 =0. 


Now let us consider the correlation function for the coin-tossing function. 

Because (f(n)) = 0 for all n, the correlation function is given by C(n1,n2) = 

(f(n1)f(m2)), which is the mean of the function evaluated at n; multiplied by 

the function evaluated at nz. We can see immediately that (f(n1)f(n2)) = 0 

when n1 # ng, because the values f(n1) and f(nz) are independent. Also, 

we see that (f(n1)f(n2)) = 1 when n1 = ng, because f*(n) is always equal 
0, for ni. ne; 


to 1. It follows that 
(f (m1) f(m2)) = 1, for n; =n, a On ,ne 


(where On, n> is the Kronecker delta symbol). 


(2.3) 


(2.4) 


We can confirm equation (2.4) by a direct calculation of this correlation coefhi- 
cient using equation (1.41). This requires the joint probability for the function 
at values n; and nz, which will be defined as follows: P(s1,52,n1,72) is the 
probability that f(n ,) takes the value s; and f(n2) takes the value s2. The 
values of the function at n; and nz are independent (when n; 4 nz), so (accord- 
ing to equation (1.8)) their probabilities multiply. Provided that n; # nz, and 
writing P(s,n) for the probability for f(n) = s, we have P(s1,82,n1,n2) = 
P(s1,71)P(s2,n2) = 5 x 5 ses 5, for any combination of s; and sg. When 
nN, = Ne, 8, and sg are the same variable, so P(s1,52,n,n) = 0 if 5; sq (be 
cause it is impossible for f(n) to take two different values for a given realisation), 
whereas P(s1,$2,n,n) = 5 if s; = Sg, because s; can be either +1 or —1 with 


equal probability . Using these results in equation (1.41), it follows that 


(f (m1) f(n2)) = 2 > P(81, 82,11, 2) $182 


$441 so=21 
GeciGe? ce: +1,71, 72) = (=e, — 1 iy, Mg) 


+ (+1)(—1)P(4+1, -1, m1, n2) + (-1)(41)P(-1, +1, 21, nz) 


for ny # na, 


1 1 1 1 
= Haas Fee 
for #1" Ne, 


1 
t+3-—0-0, 
(2.5) 


aoe Onions: 


The correlation coefficient 
was introduced in 
Subsection 1.2.2. 


Recall the discussion in 
Subsection 1.2.2. 


This was shown in the 
solution to Exercise 1.21. 


See equation (1.28). 


Recall the result of 
Exercise 1.21: the solution 
is given for the case of 
continuous distributions, 
but the same reasoning 
applies to the discrete case. 


Hip 4 


2.2 An elementary random function 


Example 2.1 


If a coin has a ‘bias’, such that the probability of a toss falling heads up is 
p = 5, calculate (f(n)), (f(m1)f(m2)) and C(n1,n2) in terms of p. 


Solution 


The function f(n) takes the value +1 if the coin falls heads up, and —1 if 
the coin falls tails up. If the probability of heads is p, then the probability 
of tails is 1 — p. So we have 
(f(n)) = p+ (— Died. (2.6) 
As the trials are independent, when n, 4 ng we have 
P(+1, +1, 1, n2) = P(+1, n1) P(+1, n2) = p’, 
Pit, = 1 ge) — P(t P= Lire) os (1 —p)’, 
Pit ee, io P(E, ei) ed od 8, = pl =o 
= P(—1,+1, 71, 2). (2.7) 
If nj = ng = n (say), then 
Pts 51a =G, 
P(+1,+1,n,n) =p, 
P(—1,-1,n,n) =1-—p. (2.8) 
For n, # ng, 
(f (m1) f (ma)) = (41)(4+1)p* + (-1)(-1) 0 — p)? 
+ (41)(—Tp— pt (- GY). —p)p 
= 4p” —d4p+1 
= (2p = 1)? (2.9) 
(as must be the case, since C(n1,n2) = (f(n1)f(n2)) — (f(n1))(f(n2)) = 0 


when n; 4 n2). As expected, this vanishes when p = 53: When nj = no =n, 
we have 


(If (nm)]°) = (41)*p + (-1)°(1 — p) = 1. (2.10) 


The results for (f (m1) f(m2)) can be summarised in a single expression with 
the help of the Kronecker delta symbol: 


(f (m1) f(na)) = 1+ 4p(p — 1)(1 — bn na). (2.11) 


The correlation function must equal zero when n; 4 nz, because f(n,) and 
f (m2) are independent: | 


C(m1, n2) = (f (m1) f(na)) — (F(ra)) (F (ma) 
= 1+ 4p(p — 1)(1 — on, ng} — (2p - i 
= 4p(1 — p)dny,no- (2.12) 
As expected, the results of the discussion preceding this example are recov- 
ered when p = s. * 


AT 


438 Chapter 2 Discrete random functions and random walks 


Exercise 2.1 


You can generate a discrete random function f(n) by throwing a six-sided die: pick 
integers n in sequence, and for each n define f(n) to be the number uppermost on 
throwing the die. What are the values of (f(n)), (f(nm1)f(m2)) and the correlation 
function C'(n1,n2) ? 


The coin-tossing function is not very useful in its own right. Its principal use 
is to generate other types of random function, which can be used to model 
interesting phenomena. In the next section we shall introduce the discrete 
random walk, which can be developed as a model for the processes illustrated 
in Figures 0.6 and 0.7. We shall concentrate on discussing random walks, 
because of their close connection with diffusion processes. At the end of 
the chapter, in Section 2.8, we introduce another type of random function 
which can be used to model smoothly varying random systems, such as 
the temperature series of Figure 0.3 or the model ocean waves shown in 
Figure 0.5. 


2.3 Random walks 


2.3.1 Definition and mathematical description 


A random walk is a process in which a particle with position X takes a 
succession of randomly chosen steps at a sequence of times labelled by a 
number T. The position X(T’) is therefore a particular type of random func- 
tion. We shall use the term ‘random walk’ for both individual realisations 
and the process generating these random functions. In most applications, 
X represents a physical position, but in some applications it can be another 
type of variable, such as the price of a commodity. 


We start by considering the simplest case, which we call the sample random 
walk, which is constructed by the following procedure. Place the particle at 
the origin, so that at time T = 0 its position is X = 0. At each step, increase 
T by 1, and toss a coin. If it falls heads up, increase X by 1; but decrease it 
by 1 if the coin falls tails up. This process is illustrated in Figure 2.2. Here, 
T takes non-negative integer values, and X takes integer values. 


‘Tails’ ‘Heads’ 
fOr 
ee ee ee a ee ee ee 
X-1 2 oe 


Figure 2.2 Illustrating the process for generating a simple random walk: the 
particle is moved one step to the right if the coin falls heads up, and one step to 
the left if it falls tails up , 


Two realisations of this random walk are plotted in Figure 2.3. Note that 
X is even when T is even, and X is odd when T’ is odd. 


2.3 Random walks 


Figure 2.3. Two realisations of the simple random walk 


Exercise 2.2 


Generate two different simple random walks, each one the result of tossing a coin 
T = 10 times, and plot them in the same fashion as in Figure 2.3. (No solution is 
given, because there are many different possible outcomes. ) 


Random walks are excellent models for many phenomena in the physical 
world, and are also useful models in many other contexts. Some examples 
will be given shortly, but first we consider how to describe random walk 
processes mathematically. The simple random walk described above can be 
constructed from the coin-tossing function f(n) described in Section 2.2. 
The steps of the variable X, which randomly takes values +1 with equal 
probability, can be taken to be the values of a realisation of the coin-tossing 
function f(n), so that X(T) is the sum of these values: 


¥ 
X(T) =~ f(n), where T=1,2,... and X(0) =0. (2.13) 
ool 


It is instructive to express the simple random walk using the recurrence 
relation 


=e bs fe (2.14) 


which describes more directly how the random walk was generated, stating 
that the new displacement at time T is the old displacement, X(T — 1), at 
the immediately preceding time, JT — 1, plus a random number, f(Z’), which 
takes values +1 with equal probability. 


49 


50 Chapter 2 Discrete random functions and random walks 


Example 2.2 


Show that equation (2.13) follows directly from equation (2.14) by repeated 
application of the recurrence relation. 


Solution 
From equation (2.14), 
X(T) = X(T -1)4+ f(T) 
= ASS =r) 


T 
=) fe, (2.15) 


where the second line follows by applying the recurrence relation (2.14) to 
X(T — 1) to obtain X(T — 2), and the third line comes from repeating this 
procedure T times. Finally, the last line is a consequence of X(0)=0. I 


The random walk displacement X(7’) at time T is a random variable which 
can be described by the probability P(X, 7) that it equals X after T steps, or 
by means of its statistics, such as the value of (X7) at step T. The statistics 
give a very informative description of a random walk. Section 2.4 will show 
how these can be calculated directly from the correlation function of the 
coin-tossing function, without first calculating the probability P(X, 7). 


2.3.2 Some examples of random walks 


The random walk is a very important concept, and has a very wide range of 
applications in the physical sciences and applied probability. Some of these 
are described briefly below. 


e A very important application of random walks is in the understanding 
of the phenomenon known as Brownian motion. This is the apparently 
random ‘jiggling’ motion of small particles of pollen or dust suspended 
in still water. The motion is not visible to the naked eye, but can be seen 
using a microscope. It is interpreted as being due to the particle being 
jostled by the random motion of the molecules making up the liquid 
(which are too small to be visible even using a microscope). Brownian 
motion was illustrated in Figure 0.6. It is modelled by a random walk in 
which the motion occurs in two or three dimensions, and in which both 
the random displacements and the time intervals between them are very 
small. Random walks in two or three dimensions and the limiting case 
of small steps will both be considered in Chapter 6. 


Brownian motion is named after Robert Brown, a Scottish botanist, who 
(in 1827) noticed agitated motion of pollen grains in water using a mi- 
croscope. It would eventually provide important evidence to support the 
atomic theory of matter. At first, it was believed that the motion of the 
pollen grains was a result of them being ‘alive’. Only after extensive further 
experiments was it accepted that ‘obviously’ inanimate materials, such as 
powdered metals, show the same effect. Even as late as 1865, it was be- 
lieved that the process of grinding materials might create ‘active molecules’, 
which would eventually cease moving. Observations of Brownian motion of 
particles in sealed containers over a period of one year helped to discredit 
that theory. By the late nineteenth century, the correct explanation was 
understood, but not universally accepted. 


2.3 Random walks 


The reality of atoms continued to be questioned by prominent scientists 
such as Ernst Mach until the end of the nineteenth century. However, a 
quantitative analysis of Brownian motion by Josiah Willard Gibbs and Al- 
bert Einstein (who worked independently), followed by experimental work 
by Jean Perrin confirming their theory, showed how evidence from Brow- 
nian motion could fit into a consistent atomic theory of matter. Perrin 
won the 1926 Nobel prize in Physics for his work on sedimentation equilib- 
rium, a phenomenon which is explained by Einstein’s theory for Brownian 
motion. 


The macroscopic process of diffusion is a consequence of random motions 
of molecules. An example of such a random trajectory is illustrated in 
Figure 2.4: the ‘black’ molecule follows an apparently random path due 
to collisions with the ‘white’ molecules. In liquids and gases it is believed 
that the molecules all have random trajectories of this type, which are 
modelled by a three-dimensional extension of the random walk, of the 
type considered in Chapter 6. 


Figure 2.4 Schematic illustration of the highly irregular motion of molecules 
in a gas: the trajectory of the ‘black’ atom colliding with the ‘white’ atoms 
can be modelled by a random walk in two dimensions 


Apparently random motion (sometimes called chaotic motion) of parti- 
cles is commonly encountered in many other physical situations. In many 
cases this motion can be modelled by a random walk. At the smallest 
scale, the conduction of electricity in metals is explained by models in 
which electrons (sub-atomic particles that carry electrical current) follow 
random walks. On the largest scale, the motion of stars through galaxies 
is sometimes modelled by random walks, in which the steps (resulting 
from near collisions with other stars) are separated by millions of years. 


Random walks also occur frequently in applied probability. Their earliest 
appearance was in problems involving games of chance: a gambler’s 
fortune takes random positive or negative steps according to the outcome 
of a sequence of bets. Such problems are complicated by the fact that 
the gambler’s fortune cannot become negative. They will not be pursued 
here, because our principal interest is in the connection with diffusion 
processes. 


Another application of random walks in probability theory is modelling 
changes in the prices of items traded in financial markets, such as com- 
modities or stocks and shares. A large number of apparently unpre- 
dictable decisions by investors to buy and sell causes erratic changes of 
the market price, such as those illustrated in Figure 0.7. 


51 


52 Chapter 2. Discrete random functions and random walks 


2.4 Statistics of random walks 


The random walk can be characterised by writing down the probability 
P(X,T) of being at position X after T steps. This is the most complete de- 
scription, and will be considered shortly. However, it is also very instructive 
to examine some statistics of X(T): we shall consider the moments (X(T)) 
and (X?(T)). These are readily obtained from equation (2.13), with the 
help of equations (2.3) and (2.4). The mean value of the displacement of 
the simple random walk, equation (2.13), is 


- 7 
eS $2 To) => (f(n)) = 0. (2.16) 

t=] cet 
(This follows immediately from equations (1.44) and (2.3).) Equation (2.16) 
tells us that if we generate many realisations of the discrete random walk, 
and average the displacement X after time T’, this average will approach zero 
as we include more and more realisations. Equation (2.16) reflects the fact 
that the displacement is equally likely to be in the direction of decreasing 

X as increasing X. 


The square of the displacement is never negative, and therefore gives a more 
useful indication of the distance X by which the particle is expected to have 
moved after T steps. Using equations (1.44) and (2.4) gives 


T T 
a= (> fim) Hoa) 


fy) fio ==] 


‘% : 
* (y > jon) so) 


eis 

S| >. (f(m1) Ff (n2)) 
721 Reel 

fi z 


dy dy Sms 


nr=i tex! 


— 1=T. (2.17) 


From these results we see that Var(X) = (X*) — ((X))? =T. 
More generally, the simple relationship 
Var(X) = 207 (2.18) 


(where D is a constant called the diffusion coefficient) is a characteristic 
property of random walks. In this case the constant 2D is unity, but later we 
consider cases where it may take other values. It indicates that the typical 
distance from the starting point (which could be defined as ,/Var(X), a 
quantity which is always positive) is proportional to the square root of the 
number of steps 7’. 


2.4 Statistics of random walks 


a 


Exercise 2.3 


If a particle, which moves along a line by means of a random walk, has been 
randomly displaced by a distance 1mm after 1s, how far is it likely to have moved 
after one week? Compare this with the distance moved after one week if the motion 
is at a constant velocity. 


ee ee 
Random walks may have a drift, so that the particle moves with a drift 
velocity as well as making random displacements. 

Example 2.3 


A random walk with drift can be obtained as follows. At each step, the 
particle moves by v + 1 with probability 5 or by v — 1 with probability S. 
Calculate (X) and (X?) after T steps, and show that Var(X) = T. Show 
that (X) = vT, so that v can be thought of as a velocity, called the drift 
velocity. 


Solution 


Here the displacements are a random function f(n) which takes value v + 1 
or v — 1, both with probability S. The mean of f(n) is 


ie) = s(u+1)+5(v—-1) sai fis (2.19) 


Because values of f(n) for different values of n are independent, for ni # nz 
we have C(n1,n2) = 0, so 


(f(n1)f(n2)) = (f(m1))(f(m2)) = v* for mi F ne. (2.20) 
Also, 
([f(n)P) = 4(v +1)? + 3(v- 1)? = 07 +1. (7) 


The random walk is a sum of values of the random function f(n): 


fa 
AT De fm). (2.22) 
wl 


The mean and second moment of X(T) are 


(X(T) = > (f(n)) = oT (2.23) 
and 


Tt 
(X@P= > > eee) 


mi 1 Rom! 
aT + De =e =e +7, (2.24) 
because there are 7’ terms where n, = ng and T 2 __ T terms where n1 # No. 


From these we find Var(X) = (X?) — ((X))* =T. Note that the variance is 
proportional to T, in agreement with equation (2.18). 


53 


54 Chapter 2 Discrete random functions and random walks 


Exercise 2.4 


Consider a process in which, at each step, a particle is displaced by +1 with prob- 
ability p or by —1 with probability 1 —p. Calculate (X), (X?) and Var(X) after 
IT’ steps. In view of the discussion in Example 2.3 and the discussion following 
equation (2.18), does this process also describe a random walk with drift? 


Some harder exercises on statistics of random walks can be found at the end 
of this chapter. 


2.5 Probability distribution of a 
random walk 


The aim of this section is to formulate and solve an equation for the proba- 
bility P(X, 7) of reaching X after T steps, in the case of the simple random 
walk illustrated in Figure 2.2. If we generate M realisations of the simple 
random walk with T' steps, and count the number Mx of these realisations 
for which the particle finishes at position X, we expect that in the limit as 
M — oo, the ratio Mx /M approaches a limit P(X,7T). We shall now cal- 
culate P(X,T) using the rules of probability theory described in Chapter 1. 


The particle can reach position X at step T' either from being at position 
X +1 at step JT — 1 and throwing tails (so the particle takes a step back), 
or from being at position X — 1 at step T’— 1 and throwing heads (so the 
particle takes a step forward) — see Figure 2.5. The probability of reaching 
X the first way is 


Pack =F 11) x P(X +1,T —-1), (2.25) 


where P(—1) = 5 is the probability of taking a decreasing step. (Note that 
these probabilities multiply because the probability of taking a step to the 
left or right is independent of the current position — recall equation (1.8).) 


Similarly, the second route contributes the probability 
Prorward = P(+1) x P(X —1,T —-1), (2.26) 
where P(+1) = 7 is the probability of taking an increasing step. 


P forward F back 


hae ae 


Aas ATI 


Figure 2.5 ‘Vhe random walker can arrive at position X from two possible 
previous locations. The probability P(X,T) of reaching X at time T is the sum of 
the probabilities for these steps. 


Since arriving at X by taking a step forward and reaching X by a step 
back are mutually exclusive events, the probability of being at X at time 
T’ > 0 is the sum of these terms (recall equation (1.6)). Thus we have 
PGA, #) = Prorward + Phack, OF 


P(X,T) = 5 [P(X -1,T-1)+P(X4+1,T-1)]. (2.27) 


2.5 Probability distribution of a random walk 


This is a recurrence relation, giving the probability at step T’ in terms of 
the values at the preceding step, J —1. The values of P(X,T’) may be 
obtained for any positive integer JT’, using the initial data P(0,0) = 1 and 
7s =] 0,0) = 0. 


Table 2.1 gives the values of P(X,7) up to T = 4, obtained by a direct 
calculation of all of the P(X,1), then all of the P(X,2), and so on, using 
the recurrence formula (2.27). 


Table 2.1 Probabilities for the simple random walk up to T’ = 4 


poh | oe bey 
is] oi] 0] mf] i] ol 


Exercise 2.5 


Use the recurrence relation (2.27) to determine the probabilities in the next two 
rows of Table 2.1, for JT = 5 and 7 = 6. 


We can solve equation (2.27) by investigating a connection with Pascal’s 
triangle, which is illustrated in Figure 2.6. 


1 
1 1 
1 2 1 
1 3 3 1 
1 4 6 4 3 


Figure 2.6 ‘The first six rows of Pascal’s triangle 


Pascal’s triangle (said to have been ‘discovered’ by Blaise Pascal, 1623-62, 
although already known and extensively used by Arab, Chinese and Indian 
mathematicians at least five centuries earlier) is a table of integers in a trian- 
gular format. Each entry is the sum of the two numbers from the row above 
immediately to the right and to the left. If there is no number entered in one of 
these positions, as for the first and last entries of each row, zero is added. For 
example, in row six at position three, the entry is obtained by adding entries 
two and three of row five, giving 4+ 6 = 10. 


Pascal’s triangle was originally devised as a construction to obtain the binomial 
coefficients C"™. When the binomial (a + y)” is expanded, terms proportional 
to «’y”—" are obtained, with r taking values from 0 to n. The coefficient of 
each such term is the number C’’, which is also the (r + 1)th number in the 
(n + 1)th row of Pascal’s triangle, starting from Cy = 1. 


The binomial coefficient C7’ is also the number of ways of selecting r choices 
from n objects, if the order in which the choice is made is irrelevant. 


55 


The initial conditions can 
be expressed concisely in 
terms of the Kronecker 
delta symbol: 

PULA) = tx 0. 


The formula for C7’ will be 
given shortly. 


56 Chapter 2 Discrete random functions and random walks 


Exercise 2.6 


Determine the numbers in the next row of Pascal’s triangle. Also, expand (1+ 2)°, 
and confirm that the coefficients of x” are given by the numbers that you have just 
calculated. 


Table 2.1 should be contrasted with the first five rows of Pascal’s triangle. 
It can be seen that the probabilities for any given value of J’ are equal 
to corresponding entries in the (J’+ 1)th row of Pascal’s triangle, divided 
by 2. 


One way to understand the connection with Pascal’s triangle is to use the 
interpretation of its elements C7’ as the number of ways of selecting r choices 
from n objects. After 7’ steps, the simple random walk has made N positive 
steps and J7'— N negative steps. Each possible path for the walk involves 
choosing T’ steps independently, each with probability , so the choice of 
any particular realisation has probability (5)7. The probability P(X,7T) of 
choosing any of the Cy random walks of N positive steps from a total of 7’ 
is 


P(X,T) = a Ce (2.28) 


where X and TJ’ are such that there are exactly N positive steps. It is more 
natural to express the right-hand side of equation (2.28) in terms of X and 
T rather than N and T. This can be done by relating the number of positive 
steps N to the position X: we have X = N x (+1)+(T-—N) x (—1), so X = 
2N —T. Rearranging gives N = (X + T)/2. Expressing equation (2.28) in 
terms of X, we find that equation (2.27) is expected to have a solution of 
the form 
1 

PAT) Sr Clix 7) /2" (2.29) 
Note that this solution applies only when X and T have the same parity (i.e. 
both even or both odd). As can be seen from the expression X = 2N — T, 
it is impossible for the parity of the position X reached by the random walk 
to differ from that of the step index T, so when the parities of X and T 
differ, P(X,7T) = 0 (see Table 2.1). 


Equation (2.29) is the required solution of equation (2.27) when X and T 
have the same parity. In order to make use of this solution, it is necessary 
to have a formula for the binomial coefficients. The coefficient C7?’ in the 
(r + 1)th location along the (n + 1)th row of Pascal’s triangle (with r running 
from 0 to n) is given by 


(ieee (2.30) 
r ¢l{n ry! . 
where n! is the factorial of n, given by n! =n x (n—1)x---x3x2x 1, 
and (by convention) 0! = 1. 


The arguments presented in the preceding two paragraphs deduced equa- 
tion (2.29) from known facts about the binomial coefficients, without giving 
a proof. The following exercise verifies that equation (2.30) gives the entries 
in Pascal’s triangle, and that solution (2.29) satisfies equation (2.27). 


2.6 An approximate form for the probability distribution 


Exercise 2.7 
The coefficients in Pascal’s triangle satisfy the recurrence relation 
a ae. (2.31) 


for n > 1, where C” is the (r + 1)th coefficient in the (n + 1)th row, and coefficients 
with r < —1 or r >n+1 are taken to be zero. Note the close similarity of this 
relation to equation (2.27). Show that equation (2.30) satisfies this relation, and 
also that solution (2.29) satisfies equation (2.27). 


Exercise 2.8 


Write down the equation analogous to (2.27) which applies to the model discussed 
in Exercise 2.4. 


Equation (2.29) is an exact formula, but is not particularly instructive, be- 
cause the factorial functions are not easy to use in subsequent calculations. 
The next section gives a simple and very useful approximation for P(X,T). 


2.6 An approximate form for the 
probability distribution 
Figure 2.7 shows the values of the probabilities P(X,T) given by equa- 


tion (2.29) plotted as a function of X for two fixed values of 7’, compared 
with an approximate form Papp(X,T') given by the expression 


Po) = emp 27), eieed 


2 

af 2a 
For clarity, the figure plots Papp(X,T) for T = 3 and T' = 11 as two continu- 
ous curves despite the fact that they represent probabilities at integer values 
of X, not probability densities. The exact values of P(X,7T) are shown as 
dots. This figure shows that equation (2.32) is a very good approximation, 
and its accuracy is seen to improve as T increases. It should be understood 
that this approximate distribution applies only to values of X with the same 
parity (odd or even) as T,, and that the probability must be taken to be zero 
for the opposite parity (as shown by the dots on the X-axis in Figure 2.7). 
The approximate probability (2.32) is a Gaussian function, with a variance 
that is proportional to 7’. 


57 


58 Chapter 2 Discrete random functions and random walks 


Figure 2.7 Comparison of exact probabilities for a random walk (dots) with a 
Gaussian approximation (continuous curve), for the cases T = 3 and T = 11 


Exercise 2.9 | Hard exercise 


Estimate (X?) for the approximate probability (2.32), by approximating the sum 
over values of X by an integral. Verify that the result is identical to that for the 
exact probability, given by equation (2.17). 


We shall now consider why solution (2.29) can be approximated by the 
Gaussian function (2.32), using an approximation for the logarithm of the 
factorial function N! known as Stirling’s formula: 


In(N!) = NIn(N) — N+ 4ln(2rN) + O(1/N). (2.33) 


The final term indicates that N times the magnitude of the error is bounded 
in the limit as N — oo. 


Exercise 2.10 


Using an electronic calculator to evaluate Stirling’s formula, estimate 6!, 8! and 10!. 
Compare these estimates with the exact values. 


Confirm that the relative error decreases as N increases. (If 2 is an approximation 
to xg, the relative error is defined as |x — x|/20.) 


Stirling’s formula is a far from obvious result; the following remarks give 
some insight into the form of equation (2.33). 


Let us obtain upper and lower bounds for In(n!) by using the function 


L(x) = In [Int(x + 1)], (2.34) 
where Int(z) denotes the integer part of x. Note that For example, 
. e Int (23.764) = 23, and we 
In(n!) = S~ In(j) ae / dx L(x), (2.35) shall consider only x > 0. 
j=l ? 


because the integrand is constant over a succession of intervals of unit length 
(see Figure 2.8). Also note that since x < Int(z+1)<a#+1, 


In(a) < D(x) < In(# + 1) (2.36) 


(as also shown in Figure 2.8). Finally, integrating this inequality from 1 to n, 
we conclude that 


nin(n) —n +1 <In(n!) < (n4+1)In(n + 1) — (n+ 1) — 2In(2) + 2. (2.37) 


This is consistent with Stirling’s formula, and gives some insight into the form 
of the two leading terms. 


2.6 An approximate form for the probability distribution 59 


Figure 2.8 The function L(x) = In[Int(x + 1)] is bounded above by In(a + 1) 
and below by In(x) 


Exercise 2.1 | 


The step from equation (2.36) to equation (2.37) requires the integral of In(z). 
What is this integral? 


Now the expression (2.29) for P(X, 7’) will be approximated using Stirling’s 
formula. First write it in terms of factorial functions, using equation (2.30): 
1 ou T! 


ceed oma CCS OT (Ces 3) 


P(X,T) = 


(where X + T is even). Now take the natural logarithm of this equation, 
using the rule In(a/b) = In(a) — In(b), and apply Stirling’s approximation to 
each factorial: 


In[P(X,T)| = In(T!) — In| (5(1 + X))!] —m[(5(F — X))!] — Tn(2) 
~Tin(T) - (=5*) m (FE) 
- (75*) in (A*) - 7m 
+l ero 


(2.39) 


Exercise 2.12 


Check that you can obtain equation (2.39) by applying formula (2.33) to equa- 
tion (2.38) and dropping the error terms. 


60 Chapter 2 Discrete random functions and random walks 


Equation (2.39) can be simplified, using an approximation which is valid 
when |X/T| <1. The following Taylor series will be useful for expanding 
the terms involving logarithms: 
\? \3 
hae O(*). (2.40) 
Writing « = X/T and using equation (2.40), we find after a series of steps 


that equation (2.39) gives the simple approximation 
9 The steps leading to this 
In(P) ~ _Le +1in 2 (2.41) expression are the subject 
> Z = wT} of an end-of-chapter 


cea eee Exercise. 
Exponentiating, and substituting « = X/T, gives 


P(X,T) ~ exp(—377 98), (2.42) 


2 
/ 2nt 
which is the Gaussian approximation introduced as equation (2.32). Thus 
P(X,T) is approximated by a Gaussian distribution when 7 > 1 and 
X/T <1. This approximation is very useful because the Gaussian func- 
tion has more convenient properties than the factorial functions appearing 
in equation (2.29). 


The end-of-chapter exercises include a structured exercise (Exercise 2.18) to 
help you to fill in the gaps going from (2.39) to (2.41). The calculation is 
lengthy, and is a good test of your ability to do algebra, but it contains no 
important additional ideas. 


2.7 Relationship with the diffusion 
equation 


This section will present an alternative approach to explaining why the 
probability distribution of a simple random walk is well approximated by 
a Gaussian distribution. It is based upon noting a relationship between a 
version of equation (2.27) and a partial differential equation of the form 
2 

Spee (2.43) 

OT ax 
Equation (2.43) is an important equation of applied mathematics, known as 
the diffusion equation, and the constant D is called the diffusion coefficient. See also equation (2.18), 
In later chapters it will be shown that the connection with the diffusion where D was also 
equation is not coincidental. encountered. 


To understand how equation (2.43) relates to a random walk, we consider 
a random walk in which a particle takes steps of length 6X (instead of 
steps of length 1 as before) at times separated by 07’, such that there are 
equal probabilities for jumping to the left or right. Repeating the argument 
leading to equation (2.27), we obtain 

P(X,T + 6T) = 5 [P(X + 6X,T) + P(X — 6X,T)]. (2.44) 
It will be assumed that 67’ and 6X are small. Subtracting P(X,T) from 
both sides, we have 

P(X,T + 6T) — P(X,T) 

= 5 [P(X + 6X,T) + P(X — 6X,T) — 2P(X,T)]. (2.45) 


2.8 A random function with correlations 


Now we make a Taylor series expansion of all the terms about (X,7), and 
find 


PL Ter) -2a 


— oP per 
ae —(X,T)6T +4 ae ary = (2.46) 
and 
5 [P(X + 6X,T) + P(X — 6X,T) — 2P(X,T)| 
or 6*P i 
= 3930 | T) (6x)? +3 ee [00S Se ee (2.47) 


If we assume that both 6X and 6T are sufficiently small that only the leading 
terms of both these Taylor series need be retained, then on substituting these 
terms into equation (2.45) we have 


OP Ge ee 
OY dar ax’ 


so the recurrence relation (2.44) can be approximated by the diffusion equa- 
tion (2.43), with diffusion coefficient D = (6X)*/2 6T. 


(2.48) 


Now we note that the Gaussian function Pypp(X, 7) in equation (2.32) is an 
exact solution of the diffusion equation. Demonstrating this fact is left to 
the following exercise. 


Exercise 2.13 


Verify that Papp(X,T) = 2exp(—X?/2T)/V2znT is (for T > 0) an exact solution of 
the diffusion equation when D = 5 


Earlier (in Section 2.6) the use of the Gaussian function Papp(X,T) was 
justified by showing that it approximates the exact solution. This second 
approach has demonstrated that it is also an exact solution of an approxi- 
mate version of equation (2.44), namely the diffusion equation. 


2.8 A random function with 
correlations 


We started this chapter by introducing the coin-tossing function f(n), a 
random function for which successive values are uncorrelated, satisfying 
C(n1,n2) = 0 for ny ~ ng. In many situations it is necessary to model 
random functions with correlations. An example of this would be modelling 
the sequence of daily temperature records illustrated in Figure 0.3; these are 
apparently random, but temperatures on successive days are clearly corre- 
lated. In this and many other examples, the correlation function is expected 
to depend only upon the difference in time between observations, so that 
the correlation function may be written in terms of a function C of a single 
variable: C(n,,n2) = C(n, — ng). A more complex example, involving a cor- 
related random function of two variables, is the height of the surface of the 
ocean on a windy day, illustrated in Figure 0.5. This section will show how 
a discrete model for such correlated random functions may be constructed, 
and will examine its correlation properties. 


6| 


62 Chapter 2 Discrete random functions and random walks 


First, we consider a very simple case of a random function with correlations. 
Let M > 1 be an integer, and let f(n) be a realisation of the coin-tossing 
function discussed in Section 2.2. Now let g(n) be the average of the last M 
values of f(n), that is, 


g(n) = — [f(n) + fm 1) + f(n—2) ++ + f(a +1 M) 
M 
yt (n+1-j). (2.49) 


The function a is called a running average over the last M values of 
f(n). A realisation of g(n) is plotted in Figure 2.10. The values of g(n) 
and g(n + 1) will be correlated, because they both contain the same random 
numbers f(n), f(n—1),..., f(m+2—M). However, g(n) and g(n + 20M) 
(say) are not correlated, because the values of f(n) that contribute to g(n) 
are distinct from those that contribute to g(n+ 20M). We consider the 
correlation properties of g(n) in the following example. 


Example 2.4 


What is the correlation function C(n,,n2) for the random function defined 
by equation (2.49)? 


Solution 


The correlation function is C(n,,n2) = (g(n1)g(n2)) — (g(n 
see immediately that (g(n)) = 0 for all n, because (f(m)) = 
correlation function is therefore 


C(ni,n2) = isin dotnet 


>> u f(ny +1—9)f(n2+1—-k)) 


1))(g(n2)). We 
O for all m. The 


M M 
>>. ni+1—j,no+1—k- (2.50) 


23 hi 


The final step uses equation (2.4). The double summation therefore counts 
the number of common elements of the sets {n1,n1 —1,...,n1 +1— M} 
and {nz,n2—1,...,n2+1—M}. The elements of these two sets are shown 
as bold marks in Figure 2.9, and the common elements are indicated by 
vertical lines. We see that the number of common elements is equal to M 
when n,; = ng, and reduces by one every time |n; — ng| increases by one, 
until it becomes zero when |n, — n2| = M. It follows that 


M—\|n1—ne2 
—— | for 0 < |ny — n2| < M, 


Pon) 
0, otherwise. 


C'(n1, n2) | 


2.8 A random function with correlations 


nt+1-M n 


no+1—M nN» 


Figure 2.9 ‘lhe correlations of the running average result from those values of n 
for which f(n) contributes to both g(n1) and g(n2): the values of f(n) 
contributing to each respective function are indicated by bold marks, and the 
common values are indicated by vertical lines. Here M = 6, ng — n1 = 3, and 
there are M — |n, — n2| = 3 common values of n. 


s(n) 0.4 


0.2 


Figure 2.10 <A realisation of the correlated random function g(n) defined in 
equation (2.49) as the running average of M values of a realisation of the 
coin-tossing function, where M=25 Hi 


Thus we have seen that taking a running average over the last MM values 
of the coin-tossing function can be used as a correlated random function. 
A more general correlated discrete random function can be constructed as 
follows. Let w(n) be a weight function (note that, here, w(n) is never 
random), which approaches zero rapidly as |n| — oo, and let f(n) be a 
realisation of the coin-tossing function introduced in Section 2.2, with a 
correlation function given by equation (2.4). We define 


a(n) = S> w(n—m)f(m), (2.52) 
TM=— CO 

which is a random function (because f(n) is random). The running average 
in equation (2.49) is a special case of this construction, where w(n) = 1/M 
for 0<n< M —1, and w(n) =0 otherwise. Equation (2.52) is a dis- 
crete convolution of the random function f(m) and the weight function 
w(n). In many applications w(n) would be non-zero for all n, for example 
a Gaussian function, w(n) = Aexp[—(n/r)?], or an exponential function, 
w(n) = Aexp(—|n|/r). Normally, it is desired that many values of f(m) 
make a significant contribution to the sum in equation (2.52), so r is chosen 
to be large compared to unity. 


Figure 2.11 shows a realisation of the function g(n) defined by equation (2.52) 
in the case where w is a Gaussian function, w(n) = exp[—(n/2)?]. The func- 
tion values are unpredictable, but g(n1) and g(n2) are correlated in that they 
tend to take nearby values, over a range of approximately |n1 — ng| ~ 2r = 4. 
This random function is still defined only for integer values of the argu- 
ment n, but the function values g(n) are now real numbers, rather than in- 
tegers. The process of taking the discrete convolution with w(n) ‘smoothes 
out’ the highly erratic fluctuations of the coin-tossing function, so although 
g(n) is still a random function, it has a pleasingly ‘smooth’ dependence on n. 


63 


Discrete convolutions were 
mentioned in Block I, 
Chapter 3: see 

equation (3.47). 


64 Chapter 2 Discrete random functions and random walks 


Figure 2.1! A realisation of the correlated random function g(n) defined in 
equation (2.52). The weight function is a Gaussian, w(n) = exp[—(n/2)?]. 


There are two respects in which equation (2.52) is more useful than equa- 
tion (2.49) as a model for a correlated random function. First, the form 
of the correlation function for the running average, equation (2.49), is pre- 
scribed, whereas different choices of w(n) enable the correlation function of 
g(n), given by equation (2.52), to be varied to match different applications. 
Secondly, for some choices of w(m), the function g(n) takes a continuous 
range of values, which is more natural when modelling many physical phe- 
nomena. 


Now we shall develop a theory for the correlation properties of the function 
g(n) defined in equation (2.52). The approach is to obtain statistics for g(7) 
in terms of the statistics for the coin-tossing function f(n). This will be 
done in the following exercise. 


Exercise 2.14 


For g(n) as given in equation (2.52), show that (g(n)) = 0. [Hint: Use equa- 


tion (1.44).] 
Show that the correlation function of g(n,) and g(n2) is a function of ng — n1, given 
by 

C(ng — m1) = (g(m1) g(m2)) = > w(m1 — ng +m) w(m). (2.53) 


(Hint: Start with the definition of the correlation function of g, i.e. C(ni,n2) = 
(g(n1)g(n2)) — (g(n1)){g(m2)), and express the function g in terms of the coin- 
tossing function f using equation (2.52). Now use equation (1.44) to write the 
average of the double sum as a sum of averages, obtaining a double sum over 
(f(m1)f(mz2)). Finally, use the expression (2.4) for this correlation, reduce the 
double sum to a single summation, and make a change of variable. | 


Exercise 2.15 
Show that the correlation function C(n) obtained in Exercise 2.14 satisfies 


= C(n) > 0 (2.54) 


n=— CO 


(unless w(n) is identically zero for all n). 


2.9 Summary and discussion 


2.9 Summary and discussion 


Summary 


The important points from this chapter are listed below. 


Random functions are important for modelling many situations. They 
may be described by their statistical properties: for example, a random 
function f(n) defined for integer n can be described by its mean (f(n)) 
and correlation function C(n1,n2) = (f(n1)f(m2)) — (f(m1)) (f(m2)). We 
introduced a very simple random function, the coin-tossing function, 
with simple statistics (f(n)) = 0 and C(n1,n2) = dni no- 


Other random functions can be obtained from the coin-tossing function 
f(n) by taking linear combinations of the values f(n). Two examples 
were considered: the random walk was introduced in Section 2.3, and 
the correlated random function in Section 2.8. The discrete random 
walk X(T’) is a sum of T values of the coin-tossing function: X(T) = 
a f(n). The correlated random function g(n) is a discrete convolu- 
tion of f(n) with a weight function w(n). These two types of random 
function have different properties and applications. 


A characteristic property of random walks is that the variance of the dis- 
placement Var(X) is proportional to T, implying that the typical spread 
of the displacement (which could be defined as the standard deviation 
of X) is proportional to WF. 


For large values of T, the probability distribution of a random walk is 
well approximated by a Gaussian function. Stirling’s formula was used 
to establish this. 


The probability distribution of a discrete random walk satisfies a recur- 
rence relation, which can be approximated by the diffusion equation. 
This indicates a close relationship between random walks and the phe- 
nomenon of diffusion. 


Discussion 


There are two important issues which are left for discussion in later chapters. 


It was shown that the random walk is closely related to the diffusion 
equation, which is used to describe the apparently deterministic motion 
of dissolved materials and of heat. It is important to understand the 
relationship between the diffusion equation and the random walk. Be- 
fore this can be done, the diffusion equation itself must be understood, 
and the next two chapters will introduce this equation, and methods 
for its solution. Later chapters will show how diffusion, an apparently 
deterministic macroscopic process, arises from the random microscopic 
motion of molecules. 


The random function and random walk were defined in this chapter only 
on a discrete space, the set of integers. For many physical applications, 
functions defined upon a continuum are required — in one dimension, this 
is the real line. Most of the properties that have been described above 
extend in a natural way to functions defined on the real line. Many 
applications of random functions defined as functions of a continuous 
variable are related to diffusion processes. It is therefore natural to defer 
discussion of these until after work on the diffusion equation. Later, in 
Chapter 6, we shall discuss random walks defined on the real line. 


66 Chapter 2 Discrete random functions and random walks 


2.10 Outcomes 


In addition to being aware of the points listed in Section 2.9, after studying 

this chapter you should: 

e be able to calculate simple statistical properties for discrete random 
functions which are closely related to those already discussed in this 
chapter; 

e beable to formulate recurrence relations for the probability distributions 
of random walks, similar in form to equation (2.27); 

e be able to use Stirling’s formula to approximate expressions involving 
factorials. 


2.11 Further Exercises 


Here are two much harder exercises on the statistics of the simple random 
walk. They are tackled using the approach developed in Section 2.4. 


Exercise 2.16 


Consider the quantity (X(T) X(T>)) for a simple random walk. (This is the mean 
value of the product of the position X(7;) at time 7; and the position of the same 
realisation X(T) at time T>.) Show that (X(7)X(T2)) = min(7;, 72), where 


a; @= 5, 
min(a, b) = p nt (2:55) 
Exercise 2.17 


Show that for the simple random walk, 
(f (m1) F(n2)f (m3) f(m4)) = Oni n2Sngna + Oniyng9n2,na + On; ,n4Ona,n3 
—~ 204, ween esas; (2.56) 


where f(n) is the coin-tossing function introduced in Section 2.2 which takes the 
values +1. Hence show that for the discrete random walk (2.13), we have 


(X-(h5) =a" ar. (DF) 


Compare this with the result of Exercise 1.28, showing that there is agreement at 
leading order as J’ — oo. 


This is a very hard 
eLercise. 


Do not worry if you find 
the worked solution 
difficult. The result is 
instructive, but not 
essential to understanding 
other parts of this chapter. 


2.11 Further Exercises 


The following exercise will guide you through the calculation to obtain equa- 
tion (2.41) from equation (2.39). 


Exercise 2.18 


Writing « = X/T and using the Taylor series (2.40), show that 


(=) in( >) = = (b+e) m(5) +in(1 +9) 
Z 5 (5) ia m($) " 1 : 


i 
+ rial + Oe"). (2.58) 


An analogous expression for the term in equation (2.39) containing T’— X is ob- 
tained by changing the sign of e. Substitute these expressions into equation (2.39) 
to obtain 


hal PC, FT) # in| as EH 


A ¢& 
1 2 1 2 


Now use equation (2.40), with \ replaced by —e?, to show that the final term of 
equation (2.59) may be approximated by e?/2. When T' > 1, this term is negligible 
compared to the term —T'e?/2 of equation (2.59), leading to equation (2.41). 


AT T 
aes 


| 


67 


68 Chapter 2 Discrete random functions and random walks 


Solutions to Exercises in Chapter 2 


Solution 2.1 


The possible values of f(n) are {1,2,3,4,5,6$, each occurring with probability 5: 
The mean value is 


(f(n)) =>) Pi= (1424344454 6)/6 = 21/6 = 7/2. 


a 


When n; 4 ng, the random variables f(n;) and f(n2) are independent, so 


(f (m1) f(n2)) = (7/2)? = 49/4. 


When n 1 = no, we have 


(f(n1)f(n2)) = ([f (n)]?) = dF i? = (14449416 + 25 + 36)/6 = 91/6. 


The correlation function C(n,,n2) is equal to zero when n; 4 ng, because f(n1) 
and f(nz2) are independent. (This can be checked by calculating C(n1,n2) = 
(f(n1) f(n2)) — (f(1))(f(m2)) directly, from the results above.) 


When n1 = ng = n, we have 

C(n,n) = ([f(n)}?) — [(F())]? = 91/6 — (7/2)° = 35/12. 
The required statistics are therefore 

(f(n)) = §, 

f(a) F ea anes 


C(ni, n2) si 2 Sn. no: 


Solution 2.2 


No solution is given, because there are many possible outcomes. 


Solution 2.3 


For diffusive motion, the typical displacement is proportional to the square root of 
the time. The number of seconds in one week is 60 x 60 x 24 x 7 = 604800. For 
the random walk, the typical displacement is therefore ”604 800mm ~ 778mm = 
0.778 m. 


For constant velocity 1mms~', the distance travelled in one week is 604800 mm ~ 
605 m. 


Solution 2.4 


Following the same approach as in Example 2.3, we define a random function f(n) 
which takes the value +1 with probability p, or —1 with probability 1—p. The 
statistics of this random function were given in Example 2.1: 


(f(n)) = 2p -1 
and 
(Qp—1)*, forn, ng, 
e for 74 -= Mz. 


f(r) f(2)) = ¢ 


Calculating the statistics of X(T) using the approach of Example 2.3 gives 
(X(T)) = (2p — VT, 
(xO) = a1iee tes, 
Var(X) = (T? = T)(Qp = 1° +7 =TF*Qp—1)° 
= [1—- Qp—1)"|T 
= 4p(1 — p)T. 


Solutions to Exercises in Chapter 2 


Again, the variance is proportional to T, so this process is a random walk, now with 
diffusion constant D = 2p(1—p). Also, the mean displacement is proportional to T, 
and the drift velocity is v = 2p — 1. These results agree with those for the simple 
random walk when p = 5. 


Solution 2.5 

The non-zero entries in the row for T’ = 5 are 
P(—5,5) = $P(—4,4) = =, 
P(-3,5) = $[P(—4, 4) + P(—2, 4)] = 
P(-1,5) = 4[P(—2,4) + P(0,4)] = 3, 

and by symmetry P(1,5) = 32, P(3,5) = 3, P(5,5) = s 


For T=6: P(—6,6) = 4, P(—4,6) = &, P(-2,6) = 2, P(0,6) = 24, and for 
positive X we can again use the symmetry P(X, #ye P(-X,T). 


As a further check, the entries in each row should add up to one. 


Solution 2.6 
The numbers in the next row are 1, 6, 15, 20, 15, 6, 1. 


Also, (1+ 2x)® = 1+ 6a + 15a? + 20x? + 152+ + 62° + 2°, so the coefficients of 2” 
are the elements of the seventh row of Pascal’s triangle, as expected. 


Solution 2.7 


We demonstrate that the binomial coefficients satisfy the recurrence relation: 


n— n—1l 
bs : +48 = 


Brees ona 
n! * 
“Aero 


Now we apply this result to the proposed solution of the recurrence relation for 
ris & 


et FS) PX +1, Hie, 


1 1 T-1 
3 Lr Cox P= 1)/2 7 9F-1 = - 


1 1 T-1 
si : lar + COetT)/2— hal hie Slee 
on [Chet X47) /2-1 7 Oa, 
= = Cix+7)/2 
P| Ae i 


where the penultimate equality follows directly from the first part of the question. 


Solution 2.8 
Following the same reasoning as was used to derive equation (2.27), the equation is 


PLT) ep PX +1, =1) 40 =p SF +17 =P 


69 


70 Chapter 2 Discrete random functions and random walks 


Solution 2.9 


Using the approximate distribution (2.32), the second moment of X is 
2 
K*\ @ } - Piel ATR = eee ee), 
( ) » pp | ) /2nT » p( / ) 


where the sum is over integer values of X with the same parity (odd or even) as T’. 
A summation may be approximated by an integral, as 


b 
S62 f(n dz) ~ / dx F(x), 


where the sum is over all values of n such that a< nda <b. This is a good 
approximation when 6z is small relative to the scale over which the function f(z) 
changes. 


When T is large, the function Papp(X, 7) varies very slowly as a function of X. We 
set 6x = 2 in the above expression, because adjacent values of X differ by two. The 
summation is then approximated by an integral, divided by 6x = 2: 


2: 2 


= V2rT 2 


1 = yaa 
- ary? f duu* exp(—u*) = —= 1, = T, The integral Iz is discussed 


J2nT VT in Section 1.3. 


where the substitution u = X/V2T was used to get from the first integral to the 
second. The final result is identical to the exact one given by equation (2.17). 


(xX?) i ; aX Xegp(—X° 27) 


Solution 2.10 


By Stirling’s formula, In(6!) = 6.5653 ..., so 6! ~ 710. The correct value is 6! = 720. 
The relative error is 10/720 ~ 0.014. 


Similarly, Stirling’s formula gives In(8!) = 10.5941... and In(10!) = 15.0960..., giv- 
ing the approximate values 8! ~ 39902 and 10! ~ 3.5987 x 10°. The exact values 
are 8! = 40320 and 10! = 3628 800, so the relative errors are approximately 0.010 
and 0.008, respectively. 


The relative error is seen to decrease as NN increases. 


Solution 2.1 | 


By inspection, 


—[zxln(x) — xz] = In(z), 


dx 


so the indefinite integral of In(x) is xln(x) — 2. 


Solution 2.12 


After applying Stirling’s formula, and ignoring the error terms, the first line of 
equation (2.39) becomes 


In[P(X,T)] + TIn(T) — T + 4 n(2rT) — TIn(2 


) 
-(E2)m(ES2) (Ea) toa 


sl (ar aa) in Ge Ws Ja Ga acacia 


which can be rearranged to give the final line. 


Solutions to Exercises in Chapter 2 


Solution 2.13 


Using the trial solution Papp(X,T) = 2exp(—X?/2T)/V2nT, we have 


2 
ded (X,T) = Ee x Teg exp(—X2/20) = : | 
It follows that this trial solution satisfies 
OF sag ok OE as 
OT a oe 


which is the diffusion equation with D = 5. 


Solution 2.14 


The function f(n) has the statistics (f(n)) =0 and (f(n1)f(m2)) = dn,,ny. The 


mean value of the function g(n) is 


ain) = ( pS vin =m) fom) = S> w(n =m) (f(m)) =0. 


TMm=— CO Mm=— CO 


(Note that the w(m) are fixed numbers, not random variables, so (w(n — m) f(m)) 


w(n —m)(f(m)).) The correlation function C(n1,n2) of g(n) is 


(g(n1) g(n2)) = 


[fs 


w(n, — m1) w(n2z — M2) f(m1) fom) 


= S> Yd wl — m1) w(ng — ma) (fF (m1) f(mea)) 
= » .» w(ny aan m1) w(n2 m2) re. 


| 

& 
ha 
| 
5 

MN 
> 
bo 
| 
5 


Note that this is a function of the difference n; — ne, rather than of n; and ne 


separately, so the correlation function depends upon only one variable and can be 


written as C(nz — 71). 


Solution 2.15 


The sum of the values of the correlation function from the previous exercise is 


I 
ws 
M 


Me 


where equality to zero holds only if w(n) = 0 for all n. 


TI 


In the last line, the 
summation variable was 
changed to m = ng — M4. 


In the penultimate line, 
the substitution m—-n=k 
was used. 


72 Chapter 2 Discrete random functions and random walks 


Solution 2.16 
Assume that 7, < 7>. Then 


(X(T1)X 1) = (92 fl ri) 


Ti T2 ty 
= ee = 1 = T; 
fi, =i ftosl ig =1 


Swapping the names of the symbols 7) and 75, we conclude that if T> < 7; then this 
statistic is equal to T>. It follows that (X(7,)X(7>)) = min(7), 72), as required. 
Note that the summation over nz is equal to unity only if n; is contained in the 
set of integers {1,2,...,75}, so if we had not assumed that T, < 7), the final line 
would have been incorrect. 


Alternatively, consider Figure 2.12, where T> < 7), which shows that there are only 
7T> non-zero values in the double sum. 


ny 


T ERESEERARERRERARRE BERS 


T, Ny 


Figure 2.12 Diagram showing values of (n1,2) contributing to the double sum in 
Exercise 2.16 when T> < T;. The TJ non-zero values of (f(n,)f(m2)) are shown as 
black squares. 


Solution 2.17 


When the numbers nj, n2, 23, 24 are all different, the random numbers f(n,), f(n2), 
f(n3) and f(n4) are independent, and the mean value (f(n1)f(n2)f(n3)f(nm4)) can 
be written as a product of four factors, each of which is zero. Similarly, if only 
two or three of these numbers are the same, the mean value with respect to the 
remaining number or numbers is equal to zero. The only possibility for the mean 
value being non-zero is if the numbers are equal in pairs: the three possible pairings 
are 2, = 4; Ag = Ng OF Ny = Na, Na = Ng OF Ny |= My, He = ty. Considering one of 
these cases (n; = n3, N2 = N4, say), we obtain 


([f (na) |? [f(me)]?) = (2 x 1) = 1. 


In the case where n; = no = n3 = n4 = nN, we have ([f(n)]*) = 1, because [f(n)]* 
is always unity. Thus the mean value (f(n1)f(n2)f(n3)f(m4)) is equal to unity if 
the arguments are equal in pairs, and is zero otherwise. This conclusion may be 
expressed by writing 


(f (ni) f (m2) f (m3) f(m4)) we On .n29n3,n4 + Oni ,n39ng,na si On, ,na9ng,ng 


oh 20n, »n2 On, »N3 On M4" 


The first three of the right-hand-side factors are unity when the arguments are 
paired, and zero otherwise. ‘The final term is non-zero only when all arguments are 
equal, and in this case ([f(n)]*) =1+1+1-—2=1, as required. 


Solutions to Exercises in Chapter 2 


Given this result, we have 


fi és is 4 
a a > > >. Cee + Onj,n39ne,n4 + Oni ,n49n2,ns 


a 20n4 2 Ony »N3 On, aa 


“Saye Yi ye Ey 1-2 


1 =1 ne1 y= 1 esl 3 ol! fi5=1 %) =1 
= 7? 49% 47? oF 
= 37" = 9T. 


When studying Gaussian distributions, we found that (x*) = 3[(x?)]?. For the 
simple random walk, we have (X*) = T, so if the distribution has a Gaussian 
probability density, we would expect (X*) = 3T%. This is equivalent to the exact 
result at leading order in T’. 


Solution 2.18 
Setting « = X/T, we have 


4) m(E54)- 


my] to] ol oe 


J 
OY caine ON gis 


+ 
ix T T ‘% a: 
eases: ae ae | “ale ae 3) 
sin() +5 | + In ser a + O(e°) 


Similarly, by changing the sign of e, 


(8) E58) ~Ze-om( Se) 


<2 1% aed a # ete 
= 5 in( 5 ) i +in(F) e+ F< + O(e°). 


Substituting these results into equation (2.39) gives 


In[P(X,T)] ~ Tn(T) — (G n( 5 ) + c + n( S 


4 
-(Fe(5)-$ bm) Fe) 28 


+s | 


Tir) rin(5 ) mr Waa) = = a jin] | 


We therefore have 


T 2 
ln FA Fe a _ n=) — $In(1—€’). 


Now using equation (2.40), In(1 — €) is approximated by —e?, giving 


In[P(X,T)] ~ sn() - (=) a 


See Exercise 1.28. 


7 5 in(1 ei + (>) e+ = In(l Pe) 


73 


74 Chapter 2 


When TJ > 1, 


which is equation (2.41). 


Discrete random functions and random walks 


CHAPTER 3 
The diffusion equation 


3.1 Introduction 


In this chapter we shall explain a derivation of the diffusion equation, as 
a description both of the process of diffusion of dissolved materials, and of 
changes in temperature due to the flow of heat. The derivation is based upon 
a plausible assumption, which is known as Fick’s law (for particle diffusion) 
or Fourier’s law (for the flow of heat). A more fundamental derivation, based 
upon a random walk model for the motion of molecules, will be considered 
in Chapter 6. Before introducing the physical concepts which lead to the 
diffusion equation, we shall discuss the mathematical form of this equation. 


The diffusion equation is a linear partial differential equation. The one- 
dimensional form of the diffusion equation has already been introduced in 
Chapter 2, where it appears as equation (2.43). It is reproduced below with 
a slightly different notation: 


2 

ag. (3.1) 

Ot Ox 
Here D is a positive constant known as the diffusion constant. A function 
f(a,t) must be determined which satisfies this equation. As it stands, the 
equation does not have a unique solution. Additional information, in the 
form of conditions which must be satisfied by the function f(x,t), is deter- 
mined by the system that the equation is used to model. These additional 
conditions are usually termed initial data and boundary conditions, and will 
be discussed in detail in Chapter 4. 


The function f(xz,t) and the variables x and t may represent various quan- 
tities in different contexts. When the equation is used to describe diffusion, 
f represents a probability density or a concentration of particles (which will 
be described shortly). The variable x usually represents a Cartesian coor- 
dinate for the position of a point in space, but diffusion can occur in more 
abstract spaces; for example, in applications to financial mathematics, x 
could represent the value of a commodity. The variable t almost always 
represents time. When the equation is used to describe the flow of heat, 
f represents the temperature at position x and time t. In this context the 
diffusion equation is known as the heat equation. 


In three dimensions, the diffusion equation takes the form 


ee 
ap = DVS: (3.2) 


where f depends upon position r = (x, y, z) and time t, and where V? is the 
Laplacian operator. Its action on a function f(x,y, z) is defined by 
op OE OF 


2 ge a —— 
eo a ae oe Es) 


V2 is pronounced ‘nabla 
squared’. 


76 Chapter 3 The diffusion equation 


In some texts, the Laplacian of f is written Af, rather than V?f. The symbol 
A will be used for another purpose in this chapter. The Laplacian operator A is the upper case Greek 
was introduced earlier, in Chapter 2 of Block 0. ‘delta’. 


Exercise 3.| 

Calculate V?f for each of the following functions. 
(a) f(x,y,2) =e ty +2 

(b) f(x,y,z) =2* -y" 


(c) fir) = i/e, whee tS aay ee se. 


(Do not consider the point where r = 0, where f is not differentiable.) 


Part (c) requires a fair amount of algebra to arrive at a very simple result. 
A more insightful route to the same conclusion will be considered later, in 
Exercise 3.17. 


Exercise 3.2 


(a) Based upon Equations (3.1) and (3.2), guess the form of the diffusion equation 
in two dimensions, with coordinates (x,y) and diffusion constant D. 


(b) Find asolution of the two-dimensional diffusion equation in the form p(x, y,t) = 
A(t) exp[—G(a? + y?)/t], where the constant 3 is to be determined. What is 
the form of the function A(t)? 


[Hint: Substitute this expression into the diffusion equation, and group terms 
which have the same dependence on position. You will find a relation between 
Band D, and a differential equation for A(t).| 


This part of the exercise will develop your facility with partial differentiation 
and simple ordinary differential equations. The solution to Exercise 3.24 will 
discuss a more direct alternative approach. 


Having described the mathematical form of the diffusion equation, most of 
the remainder of the chapter will consider how it arises in a physical context. 
This requires introducing a sequence of concepts: concentration, flux, flux 
density, and the continuity equation. Having discussed these concepts, we 
then introduce Fick’s law, and use it to derive the diffusion equation. 


3.2 Diffusion 


Scientific enquiry has demonstrated that all materials are made up of very 
small particles called atoms or molecules. (Molecules are clusters of atoms 
tightly bound together by ‘chemical bonds’, but the distinction between 
atoms and molecules is not important for understanding the mathematical 
content of what follows.) These particles are so small that they are invisible, 
even using a microscope. Any piece of material that you can see with the 
naked eye therefore contains enormous numbers of particles; for example, a 
glass of water contains (very roughly) 107° water molecules. 


Even in a material where there is no detectable motion to the naked eye, 
the molecules are always in motion. The motion of the molecules in liquids 
and gasses (illustrated in the previous chapter in Figure 2.4) is apparently 


3.2 Diffusion 


random, in all possible directions, and with varying speeds (typically hun- 
dreds of metres per second, for materials at room temperature). Atoms in 
solids are also constantly in motion, but they vibrate about fixed positions. 


Our everyday experience is that objects do not remain forever in motion, but 
lose energy due to the effects of friction. When two atoms collide, however, 
it is now accepted that the total kinetic energy is ‘conserved’, meaning that 
the total kinetic energy after the collision is the same as before. Because 
no energy is lost when atoms collide, the collisions do not result in the 
atoms slowing down. It is natural to ask why atoms are unaffected by 
friction, whereas macroscopic (larger scale) objects are subject to friction. 
The explanation follows from the fact that energy which is apparently lost 
into friction is the result of the kinetic energy of a large body being converted 
into the apparently random motions of many atoms. In the case of atoms, 
there are no smaller constituents into which their kinetic energy can be 
distributed, so that collisions between atoms involve no loss of energy. 


The rapid motion of atoms or molecules is described as happening on a 
microscopic scale, even though it cannot be observed directly even with the 
most powerful optical microscope. (The distance between the molecules in 
a liquid might be 10~?m or less, whereas a microscope using visible light 
does not allow us to distinguish objects much less than 10~° m across.) 


Although the microscopic motion of atoms cannot be seen directly, it gives 
rise to phenomena that are observable on a larger (‘macroscopic’) scale. 
Diffusion is one such phenomenon: it causes mixing of different types of 
atoms or molecules. Figure 3.1 illustrates a process of diffusion. One layer 
of liquid (clear) is placed on top of another layer (dark). After a while, 
the boundary between these layers is no longer distinct. ‘The explanation is 
that some of the molecules in the coloured liquid have migrated up into the 
clear layer, and conversely some molecules from the clear layer have moved 
downwards. This mixing occurs as a result of the microscopic motion of the 
molecules. 


Clear water on top After diffusion has 
of dyed water occurred for several hours 


Figure 3.1! Diffusion of one liquid into another 


Mixing of fluids can involve processes other than diffusion. Mixing by diffusion 
occurs when there is no detectable flow of the fluid. If the fluids are stirred, 
they flow in a complex way that greatly enhances the mixing. 


Another way of mixing fluids is by thermal convection in which temperature 
differences create differences in density, and the less dense fluid flows to the 
surface. In most fluids, density decreases as temperature increases, so if a fluid 
is heated from below, the warm fluid rises away from the source of heat. ‘This 
causes a mixing process, called thermal convection. Where flow of the fluid 
is present, this normally mixes substances far more effectively than diffusion 
alone. 


If you want to try the experiment illustrated in Figure 3.1 yourself, you must 
make sure that the dyed and clear water do not mix by thermal convection: 


77 


78 Chapter 3. The diffusion equation 


keep the beaker on a cold surface, such as a stone floor, in order to avoid this. 
A method for observing diffusion by a simple experiment that you can try at 
home is discussed in the Appendix to this chapter. 


Diffusion occurs in gases as well as liquids. Sometimes it is claimed that when 
a bottle of perfume is opened on one side of a room, the smell reaches the other 
side by a process of diffusion. This is possible in principle, but in practice it 
applies only in rooms where the air is unusually still. The scent is usually carried 
by imperceptible air currents, set up by thermal convection, which distribute 
the scent more rapidly than by diffusion alone. 


Trying to calculate the motion of individual molecules is usually an im- 
practical approach, and statistical methods must be used to understand the 
diffusion process fully. The statistical description of microscopic motion will 
be considered in Chapter 6. This chapter considers only a macroscopic de- 
scription of diffusion, based upon a natural assumption about the diffusion 
process which is called Fick’s law. In order to discuss this law, we must first 
introduce the concepts of concentration and flux density. 


3.3 Concentration and flux density 


When describing diffusion we need some measure of the quantity of a sub- 
stance which is present at some point at a given time. This is the concen- 
tration, denoted by c, which is a function of position r and time t. We also 
need a measure of the flow of material. This is the flux density J; it is a 
vector quantity, because the flow of material has a direction, and is also a 
function of r and t. 


The concentration and the flux density are most easily defined for situations 
where the system is homogeneous, meaning that the distribution of particles 
is expected to be equivalent at all points in space, so that c and J are 
independent of the position r. In each case that we consider, the definitions 
of both c and J will be given for a homogeneous system in the first instance, 
before considering the more difficult general situation. 


3.3.1 Concentration 


Let us consider first the concentration, c. In the case where a fluid is mixed 
(for example, by stirring it) so that it becomes homogeneous, the concen- 
tration of a particular type of atom (or molecule, or other particle) in the 
fluid is simply the ratio of the number of atoms N to the volume of fluid V, 
SO 
N 

c= F. (3.4) 
In the case where the system is not homogeneous, the concentration c(r, t) 
at position r and time t may be defined as follows. Consider a small element 
of volume AV, with its centre at r. The number of molecules of a particular 
type in this region is AN(r,t) at time t. The concentration is defined as 

AN (r,t) 

cr; = ia (3.5) 
The concentration as defined by equation (3.5) depends upon the choice of 
AV. Clearly AV should be small if c(r,t) is to faithfully represent what 
is happening at position r (see Figure 3.2). It is tempting to define c(r, t) 


3.3 Concentration and flux density 


in terms of the limit of AN/AV as AV — 0, but that limit would be zero 
for most points in space, because almost all points do not coincide with the 
position of a particle. In practice, it is sufficient to choose an extremely 
small volume AV, and the number of particles contained in this volume will 
still be very large: the numbers of atoms in even the tiniest speck of material 
visible to the naked eye are extremely large. The value of the concentration 
will be, for all practical purposes, independent of the choice of AV, provided 
that AV is neither too large nor too small. 


The shape of the volume element is not important, provided that it is not 
highly elongated in any direction. It could be taken to be a sphere or a cube, 
for example. 


Throughout this chapter, the symbols 6 and A (lower and upper case Greek 
‘delta’) will be used to indicate small amounts of some quantity. Thus V stands 
for volume and AV indicates a small volume, and dt indicates a small time. 
Sometimes, there will be two distinct small quantities considered, so that (for 
example) 6N and AN may represent two different small numbers of particles 
in the same calculation. 


When we discuss numbers of particles, the quantity AN is small only in the 
sense that it is much smaller than the total number of particles N. It will 
become clear shortly that the volume element AV should be sufficiently large 
that A/V > I. 


Exercise 3.3 


(This exercise is to check your understanding of the concept of concentration: no 
calculation is required. ) 


What is the appropriate definition of concentration in one and two dimensions? 


Atlases often contain data on population densities; for example, in 1983 Hong Kong 
had a population density of 5300 per square kilometre, and the United Kingdom had 
a population density of 230 per square kilometre. How is this concept of population 
density related to concentration? 


The definition of concentration is illustrated (in the two-dimensional case) 
in Figure 3.2. 


Figure 3.2 Illustrating the definition of concentration. In this two-dimensional 
example, the concentration at a point r is defined as the number of particles 
inside a disk centred at r, divided by the area of the disk. The concentration of 
particles at r; is higher than that at ro. 


It is useful to be able to express the total number of particles inside a 
volume V in terms of the concentration. If the system is homogeneous, 
equation (3.4) implies that the number of particles is N = cV. Let us con- 
sider how this expression must be modified when c is not uniform across 
the volume. We can divide the volume up into small elements, labelled by 
17=1,...,M, each element 7 having volume AV;. The number of particles 


79 


80 Chapter 3 The diffusion equation 


AN; within element 7 is approximately c; AV;, where c; is the concentration 
at the centre of that element. The total number of particles is then 


M M 
| i=] 


If the concentration is a known function of position, we can make the volume 
elements shrink in size, while still covering the volume V. The integral over 
a volume is defined in terms of such a limit, so the number of particles 
inside V at time t¢ is given by the volume integral (as defined in Chapter 2 
of Block 0): 


(3.6) 


N= | dV cr,t). (3.7) 
V 


Exercise 3.4 


Suppose that the concentration of particles in a one-dimensional region between 
gz fend 2= i % 
c(x,t) = co + c, cos(ra/L) exp(—at), 


where co, c, and a are constants. Calculate the total number of particles N(a) 
between x = 0 and x = a (where a < L). Verify that N(a) is independent of time 
when a = L. 


3.3.2 Molar concentration 


Because the numbers of atoms in macroscopic objects are extremely large, 
it is convenient to quote numbers of atoms or molecules relative to a large 
number, the mol (pronounced ‘mole’). 


In most of the exercises, quantities will be represented by symbols which will 
not be converted into numbers, but in many practical applications scientists 
use molar concentrations, so it is important to be familiar with the concept. 


Exercise 3.5 


If 102° molecules of methanol are disolved in 500 ml of water and stirred until the 
solution becomes homogeneous, what is the concentration of methanol in the water? 
Express the answer as the number of molecules per cubic metre, and as the number 
of mols per litre. [Hint: 1 m* = 10° 1.] 


The number used in the definition of the mol is known as Avogadro’s number, 
Na ~ 6.022 x 1072. You might well wonder why such an awkward number is 
used. The definition of the mol is related to the mass of a carbon-12 atom. 
The mol is defined by the statement that 1 mol of carbon-12 atoms has a 
mass of 12g. You need not remember the definition of the mol, but should 
be aware of its existence. The weights of atoms and molecules relative to 
carbon-12 can be found from chemical tables, and the molar quantity of a 
substance is easily obtained from its weight. 


One mol is (approximately) 
equal to 6.022045 x 107°. 
Thus, for example, 

0.034 mol ~ 2.047 x 1077. 


The original intention had 
been to define the mol as 
the number of atoms in 
one gram of hydrogen, but 
for technical reasons it is 
easier to count carbon 
atoms. 


Like other chemical 
elements, carbon atoms 
occur as different zsotopes, 
having different mass: 

most carbon atoms in 
nature are of the carbon-12 


type. 


3.3 Concentration and flux density 


LK 


Exercise 3.6 


The concentration of a substance can also be expressed as a mass per unit volume, 
that is, as a density. The air in a room might contain carbon dioxide at a concen- 
tration of 250 gl~* (250 micrograms per litre), and carbon monoxide at 2ugl*. 
Each carbon dioxide molecule has mass 7.31 x 10~?°kg and each carbon monox- 
ide molecule has mass 4.65 x 1077°kg. Express these concentrations in terms of 
numbers of particles per cubic metre. 


To summarise: in practical applications, a concentration may be quoted as 
the number of particles per unit volume, as mols per unit volume, or as mass 
per unit volume. 


3.3.3 Flux and flux density 


Having defined the concentration, our objective will be to describe its vari- 
ation in time and space by means of a partial differential equation, the 
diffusion equation. This equation will be derived by considering the rate at 
which particles enter or leave a small volume element. This rate is described 
using a quantity known as the flux density: this will be defined as a scalar 
quantity J, then this will be generalised so that it is described by a vector 
x 2 


First, a flux ® of particles will be defined, associated with a given surface, 
and this will then be used to define the flux density. The flux is the number 
of particles passing through the surface per unit time. (The surface need not 
be a physical barrier, just a mathematical construct, having no influence on 
the motion of the particles.) If n particles pass through the surface in time 
t, the flux is 

— (3.8) 

t 

This formula is appropriate when the flow is steady, that is, if the flow does 
not vary with time. Later, we shall give expressions that are valid when the 
flow is changing, so that ® is a function of time. 


The definition of flux requires us to distinguish the two sides of the surface: 
if the two sides are labelled A and B, say, we consider n to be the net number 
of particles passing from side A to side B, that is, the number passing from 
A to B, minus the number passing from B to A. 


The flux density quantifies how rapidly particles cross a given surface, in 
terms of the rate of flow per unit area. In the case of a homogeneous fluid 
moving across a flat surface of area A at uniform velocity, the flux density 
is independent of position and time, and is defined as 


D 
J= = (3.9) 


The definitions of flux and flux density for a steady homogeneous flow are 
illustrated in Figure 3.3. 


S| 


82 Chapter 3. The diffusion equation 


n particles cross in time t 


area A 


Figure 3.3 Illustrating the definition of flux and flux density for a steady 
homogeneous flow. If n particles pass through this surface in time t, the flux is 
& = n/t. If the area of the surface is A, the flux density is J = ®/A. 


Exercise 3.7 


A net is placed across a river, so that it catches all of the fish swimming along 
it. If 600 fish are caught in one day, what is the flux of fish? If the river is 5m 
wide, and the water is 1m deep, what is the flux density of fish? (Assume a steady 
homogeneous flow. ) 


Exercise 3.8 


During a hailstorm lasting 5 minutes, 100 g of hailstones are collected from a circular 
pan with a diameter of 20cm. Some hailstones are weighed, and their average weight 
is found to be 0.05g. Estimate the average flux density of hailstones across any 
horizontal surface during this storm. (Calculate your answer in the appropriate SI 
unit. ) 


In general, the flux density is defined to be a function of position r and 
time t. At position r, we place a small flat or nearly flat surface element 
of area AA. The normal to this surface is a vector n of unit length which 
is perpendicular to this surface. The surface has two sides, A and B say, 
and we distinguish these by saying that the vector n points from side A 
towards side B. Between time ¢t and time t + dt, the particles crossing this 
surface are observed and counted. The number of particles crossing in the 
direction from A to B minus the number crossing in the opposite direction is 
én (see Figure 3.4). The flux of particles through this small surface element 
is A® = 6n/ot. 


The scalar flux density J of particles in the direction of n is the ratio of the 
small flux A® to the small area AA: 
A® on 

BA AA 

This quantity depends upon the choice of AA and 6t. However, when the 
number of particles involved is very large, for a wide range of small values of 
AA and dt the value of J(n,r,t) will be almost independent of the values of 
these quantities. It is also possible to conceive of very irregular flows where 
J is not well-defined, but we shall not consider these cases. 


I(n,r,t) (3.10) 


3.3 Concentration and flux density 


area=A A 


Figure 3.4 Illustrating the definition of flux density: én is the net number of 
particles passing through the surface element in time ot 


Now we consider how the vector flux density J = Jji+ Joj + J3k is defined 
at position r and time t. There will be a particular choice of unit vector n 
for which the scalar flux density J will be a maximum (for given values of 
AA and ot), which will be called Nmax. A vector-valued flux density J is 
defined so that 


ee) A ee (ei) 


that is, it points in the direction of the greatest flux density at position r 
and time t, and its magnitude, Jmax, is equal to the flux density in that 
direction. We shall see in the next subsection that J enables us to calculate 
the flux through an arbitrary surface element located at position r. 


3.3.4 Relating flux to the vector flux density 


Having defined the vector flux density J, we now consider how the flux ® 
across a surface may be expressed in terms of J. We start by discussing the 
flux across a small surface element of area AA, and proceed to considering 
the flux across an arbitrary surface. 


Consider the flux A® across a small element of area of size AA, with n 
being the unit vector normal to the surface. It will now be shown that if 
the angle between the normal vector nm and the vector Nmax is @, the flux 
is J = Jmax cos@. For simplicity, we first consider the case where all of the 
particles are moving in the same direction (which coincides with the direction 
of the vector Mmax). In Figure 3.5 the area AA is crossed by the same 
number of particles per unit time as AAg, where A Apo is a surface element 
perpendicular to Nmax. The flux A® through AA is therefore the same as 
that through AAg, but the flux densities J = A®/AA and Jmax = A®/AAo 
are different. The ratio of areas is AAg/AA = cos@, and 


ae (3.12) 


SO 


S eden” = Saw tee eS MR. (12) 


Equation (3.13) relies upon the definition of the scalar product a+ 6 of two 
vectors a and b. If the angle between the directions of these vectors is @, and 
their magnitudes are a and b, respectively, the scalar product is defined to be 
equal to abcos 6, so N+ Nmax = cos 6. 


83 


See Block 0, Section 2.3.1. 


84 Chapter 3. The diffusion equation 


Figure 3.5 ‘The flux passing through the two elements with areas AA and AAbp is 
the same 


Let us see how the flux A® = JAA can be expressed more elegantly in vector 
notation. We can introduce a vector surface element of area, AA = n AA, 
associated with a surface element of area AA with normal vector n. The 
flux crossing the surface may then be written as A® = (J - n)AA, that is, 


Ad=J-AA. (3.14) 


This is a simple and general expression for the flux through a small surface 
element of area AA: this equation, and its integrated form equation (3.18), 
are the principal results of this subsection. 


Let us consider how this general formula applies to the example shown in 
Figure 3.5. The vector AAg associated with the element of area AApg is 
in the same direction as the vector J. When the normal to the surface is 
aligned with the direction of the vector flux density, the scalar flux density 
takes its maximum value Jmax, so equation (3.14) gives A® = Jmax AAo. 
The vector AA associated with the element of area AA is at an angle 6 to 
the direction of J, so A® = Jmax Mmax * MAA = Jmax AAcosé = J AA. 


In the case where the particles do not all move with the same velocity, equation 
(3.14) remains valid. This may be understood by considering the case where 
the particles move with M different velocities, which are labelled by an integer 
index taking values k = 1,2,...,M. The particles with the velocity labelled 
by k give rise to a flux density J;. Each of these gives a separate additive 
contribution to the flux A®: 


M 
Ab= >} Fp ie (3.15) 
fan] 


This equation may also be written in the form A® = J- AA, which is equal 
to equation (3.14), if the total flux density J is given by 


M 
Se (3.16) 
9 | 


Any continuous distribution of velocities may be approached by taking the limit 
as M — ov, so equation (3.14) is valid in the general case. 


Having obtained a general expression for the flux across an element of area 
in the form A® = J- AA, we now consider how to obtain the flux across 
a general surface. We divide this surface S into a large number of small 
vector elements AA,, labelled by an index i=1,...,M. (See Figure 3.6, 
and note that the significance of the subscript label and its upper limit MJ 
are different from those in equation (3.16).) 


3.3 Concentration and flux density 


surface S 


Figure 3.6 The flux across a surface S may be expressed as the sum of 
contributions A®; from small vector elements AA; 


The flux A®; across the vector element AA; is approximately J; + AAj, 
where J; is the flux density evaluated at the centre of this element. The 
total flux is 


M M 
is | | 


The accuracy of this approximation improves as we make the size of the 
elements AA; smaller (within the limits set by the discreteness of the parti- 
cles discussed in Subsection 3.3.1, which we do not consider further). In the 
limit where the size of the elements AA; tends to zero, this sum becomes a 
surface integral, as discussed in Chapter 2 of Block 0. The flux is then given 
by 


ea ; ae 3 (3.18) 


3.3.5 Defining flux density in one dimension 


Sometimes we shall consider one-dimensional situations, so it will be helpful 
to consider the definition of flux density in one dimension. 


In one dimension it is possible to define the flux ® as the number of particles 
per unit time passing a given point. There is no area perpendicular to the 
direction of flow, so it is not possible to define a true flux density. However, 
it will be convenient to use the symbol J for the flux in one-dimensional 
problems, and this will be referred to as the flux density. The motivation 
for choosing this notation and terminology is that it makes the form of the 
continuity equation (which will be introduced in the next section) equivalent 
in one, two and three dimensions. 


In one dimension, the flux density J is therefore defined as follows. Let on 
be the number of particles passing the point x from left to right, minus the 
number passing from right to left, between times t and t + dt. This may be 
written as 


die = Jl (3.19) 


which defines the symbol J; that is, J = 6n/6ét. If the number of particles is 
sufficiently large, the quantity J(x,t) is expected to be almost independent 
of the choice of dt over a very wide range of values (in the sense that the 
change 6J due to changing dt satisfies 6J/J <1). 


85 


86 Chapter 3 The diffusion equation 


3.3.6 Summary of concentration and flux density 


Equations (3.20) and (3.21) below provide a concise summary of the impor- 
tant properties of concentration c and vector flux density J. The number 
of particles N inside a volume V at time ¢ is expressed in terms of the 
concentration c via a volume integral 


N(t) = [ iV Ceeay (3.20) 


The number of particles per unit time (the flux) ® crossing a surface S' at 
time ¢ is expressed in terms of the flux density J via a surface integral 


5(t) = [ ee (3.21) 


Equations (3.20) and (3.21) could have been used as definitions of c and J. 


When defining the flux, we have a choice about which is the positive direction 

for counting the particles crossing the surface. For a closed surface there is <A closed surface is one 
normally an inside and an outside. We shall usually adopt the convention With no boundary, such as 
that the normal to such a surface points outwards. the surface of a sphere. 


Exercise 3.9 


(a) The concentration of particles is given by c(z,y,z) =1+ 42? + 4ay?. What 
is the number of particles inside the unit cube defined by the conditions 0 < 
e< Lee e se Gece 


If the concentration is measured in particles per unit volume, is the answer 
to this question meaningful? What about if the concentration is measured in 
mols per unit volume? 


(b) The flux density is J = (2+ 42+ g27)i+ 3j+a7k. What is the flux of parti- 
cles across the side of this cube defined by the condition x = 1, with the normal 
to the surface in the direction of the unit vector i? 


Exercise 3.10 


The concentration c in a three-dimensional region depends upon the distance r 
from the origin. Express the number of particles N(R) inside a sphere of radius 
R in terms of an integral over the function c(r). In the case where c(r) = 1+ 1/r, 
show that N(R) = $7R? + 27R?. 


Exercise 3.1] 


Consider the situation where the flux density J is always directed away from the 
origin, and has a magnitude J which depends only upon the distance r from the 
origin. Write the outward flux ®(R) across a spherical surface of radius R from the 
origin in terms of the flux density J(r). Show that the flux is independent of radius 
if J is proportional to 1/r?. 


3.4 The continuity equation 


3.4 The continuity equation 


In the previous section, we introduced the concepts of concentration c(r, t) 
(describing the number of particles present in the vicinity of a point), and the 
flux density J(r,t) (describing how these particles are moving). As the next 
step towards obtaining the diffusion equation, we are now going to obtain 
a partial differential equation, called the continuity equation, relating the 
flux density to the rate of change of concentration. The principle involved 
is quite simple: if the number of particles inside a small region is increasing, 
then the concentration increases. The net number of particles entering a 
region depends upon the balance of fluxes across the surface of the region, 
and if the region is very small, then this can be expressed in terms of the 
spatial derivatives of the flux density. We shall look at this first in one 
dimension, before turning to the three-dimensional case. 


Before starting the calculation, it may be helpful to say a little about the nota- 
tion. You need not take in everything before proceeding to read the calculation, 
but you may find it useful to return to this comment if you find the notation 
confusing. 


Spatial increments will be written with an upper-case delta; for example, AV 
is a small volume element. The small increment in time will be written ot. 


The calculation uses three similar symbols, all representing relatively small 

changes in numbers of particles: 

e AN is the number of particles in a small volume element AV; 

e ON is the change in the number of particles in the volume element occurring 
in a short time 0t; 

e 6n is the number of particles passing through one surface of the volume 
element in time ot. 


Thus, changes preceded by 6 are changes over a time interval dt, and may be 
divided by dt to obtain a rate of change. 


In order that the same definitions can be used in both the one-dimensional 
case and the three-dimensional case, in one dimension the ‘volume’ of the small 
element must be interpreted as its length, and the ‘surfaces’ are the two end 
points of the element. 


3.4.1 The continuity equation in one dimension 


In one dimension, the number of particles in the small interval |x, 2 + Az] 
is, by definition, AN = c(x,t) Ar (where c(a,t) is the concentration). The 
change ON in AN in time dt is given by the net number of particles entering 
the interval by passing the point x from the left, minus the net number 
leaving the interval by passing x + Az to the right: 


ON = 6n(xz,t) — dn(a + Az,t), (Bing) 


where 6n(x,t) is the net number of particles passing the point x, from left 
to right, between t and t+ 6t. Expressing dN in terms of the flux density 
J, as defined in equation (3.19), we have 


ON = [J(a,t) — J(a + Az, t)| ot 25) 
(see Figure 3.7). 


87 


88 Chapter 3. The diffusion equation 


J(x, t) dt particles J(x+Ax, t)dt particles 
enter interval leave interval 
ote ne 
eS ee ee ee ee ee eee 
x sex 


AN=c(x, t) Ax particles 
in interval 


Figure 3.7 Illustrating the derivation of the continuity equation in one dimension: 
the number of particles within the interval is related to the concentration c, and 
the number of particles entering or leaving is related to the flux density J 


The change in concentration occurring in time ot is 


— AN N 
c(z,t + dt) — c(a,t) = ANGE) a oi = =. (a2) 


Dividing by ot and substituting from equation (3.23), we have 
c(a,t + ot) — c(a,t) 1 ON 


it ~ Aa bt 
J(a + Az,t) — J(a,t) 
npn een: MES ES 2 
ae (3.25) 
Taking the limits as Ax — 0 and dt — 0, then adding 0//0z to both sides, 
we obtain 
oe gas 
—+—=0. 53.20 
Ot - Ox ( ) 


This equation is known as the continuity equation, relating flux density and 
concentration. It is a very general result. The only physical principle used 
in the derivation is that the particles move around without being created 
or destroyed: the particles are said to be conserved. ‘This is a very common 
situation, and the continuity equation is therefore one of the fundamental 
equations of applied mathematics. 


The detailed calculation presented above is rather cumbersome, but you 
should note that its essential elements are quite simple. The rate of change 
of concentration, Oc/Ot, must be equal to the difference in the rates at 
which particles enter and leave a small interval, divided by the length of 
the interval. ‘The rates at which particles enter and leave the element are 
respectively J(z,t) and J(a + Az,t), and their difference is approximately 
—(OJ/0x)Ax. Dividing by Az, we obtain the continuity equation 0c/0t = 
—OJ/Ozx. 


Further insight into equation (3.26) can be gained by integrating it with 
respect to 2 on some interval (not necessarily small) of the real line |a, bj. 
This gives 
ON 
Ot 
where N(t) = i. dx c(x,t) is the number of particles inside the interval |a, }]. 
Equation (3.27) is telling us that the rate of increase in N(t) is equal to the 


flux density of particles entering the interval at « = a, namely J(a,t), minus 
the flux density of particles leaving the interval at 7 = b, namely J(b,t). 


= J(a,t) — J(b,t), (3.27) 


3.4 The continuity equation 


3.4.2 The continuity equation in three dimensions 


In three dimensions, the derivation is similar. Consider the rate of change 
of concentration of particles in a small cuboidal element with sides Az, Ay, 
Az aligned with the Cartesian axes (see Figure 3.8). 


Figure 3.8 The rate of change of concentration within this volume element is 
determined by the total flux across its six surfaces 


Let the number of particles inside this element of volume AV = Az Ay Az 
be AN, and let the change in AN in time 6t be dN. The quantity dN is 
equal to the sum of the net numbers of particles entering through each of the 
six sides in a time interval of length dt (see Figure 3.8). Consider one of these 
sides, a rectangular surface with a constant value of the coordinate, x say 
(with the other coordinates in the intervals [y, y + Ay] and [z,z + Az]). The 
area of this surface element has magnitude AA = Ay Az, and the inward 
normal to this surface element is n = i, where i is the unit vector aligned 
with the z-axis. (In this calculation, we choose not to adopt the usual 
convention that all elements of area are directed outwards from a closed 
surface; instead, for convenience, we take them to be aligned with the unit 
vectors i, j and k.) The net number of particles passing through this small 
surface element in the direction of increasing x during time dt is 


én, = J - AAdt, (3.28) 
where J = Jji+ Joj + J3k is evaluated at the centre of the element, at 
position (x, y + 5 Ay, z+ + Az). The dot product J+ AA is simply J; Ay Az, 
because the direction of AA = Ay Azi is aligned with the z-axis. The net 
number of particles passing the surface in time dt is therefore 


ény = Ju(a,yt+ 5 Ay, z+ + Az,t) Ay Az ot. (3.29) 
We can now expand J; as a Taylor series about (2, y, z,t): 
OS OS 
Jy(2,y + + Ay, z+ + Az,t) = alt; % 4;t).+ Ava, + Aas. Bie 5s 
(3.30) 


(where the partial derivatives are all evaluated at (x,y,z,t)). Terms pro- 
portional to higher powers of Ay and Az will be generated, but these terms 
may be neglected in the limit Ay — 0, Az — 0. 


Similarly, the formula for the net number of particles leaving via the opposite 
face, at x + Az, is 


dng = Ji(x + Ax, y + SAy, z + 5 Az, t) Ay Az ot. (3.31) 


Note that dng makes a contribution to dN which differs in sign from 0n1 
in equation (3.29), because particles crossing in the direction of increasing 
x across that face are leaving the volume element. The expression for dn 
differs from 6n, only in that J; is evaluated at position (a + Aw, y+ 5 Ay, z+ 
5 Az). Expanding the value of J; about (x,y, z,t) at this position gives an 
additional term of first order in Av, namely Ax OJ; /Ox, as well as the same 


90 Chapter 3 The diffusion equation 


Ay and Az terms as before. The difference between the numbers of particles 
crossing these opposite faces shown in Figure 3.8 is thus 


bny — Ong = | Ji(x,y + sAy, z + $Az,t) 
—Ji(x+ Ax,y+ 5 Ay, z+ 5 Az, t)| Ay Az ot 
OS; 


~ — a, (EUs et) Ax Ay edt. (3.32) 


Repeating this calculation for the differences dng — dn4 and dns — dng gives 
similar expressions, containing 0J2/Oy and OJ3/0z, respectively. Neglecting 
terms of higher order in the dimensions of the box (i.e. terms of higher order 
in Aw, Ay and Az), the total gain in the number of particles within the 
volume element in time dt is the sum of six contributions, corresponding to 
the six fluxes illustrated in Figure 3.8: 


ON = 6n1 — Ong + ng — Ong + ON5 — ONG 
OJ, OJ» OJ3 
~ — | — 4+ 4+ —| Ar Ay Az bt. sa 
tg 2 FB) ae Ay Az re 
Dividing 6N by AV = Az Ay Az gives the change in concentration in time 
dt, so, neglecting terms of higher order in Ax, Ay and Az, dividing by ot 
and taking limits as dt — 0 and AV — 0, we have 


Oc OS, OJg  Od3 iy 
Ot ba  @y ” Oars & 


The combination of derivatives in the bracket occurs frequently in discus- 
sions of vector fields, and we saw in Block 0, Chapter 2 that it is called the 
divergence of the vector field J. There are two commonly used abbreviated 
notations for the divergence of a vector field F: 


OF 1 OF» OF % 

Ox Oy Oz 
The dot product notation in equation (3.35) arises from regarding V as a 
vector operator with components which are differential operators: 


O O O 
V =i—+j-—+k—. 3.36 
Ox TI5, - Oz ( ) 
The divergence V : F is the scalar product of this vector differential oper- 
ator and the vector field F’. With these notational conventions, the three- 


dimensional continuity equation takes the form 


Oc 
5 t VI =0. (3.37) 


(3.34) 


=V-F=divF. (3.35) 


A commonly occurring situation is that of a steady flow, i.e. with the 
concentration independent of time, so that Oc/Ot = 0 at each point in space. 
In this case the vector flux density J has zero divergence, i.e. V- J =O at 
each point. This situation is also referred to as the steady state. 


The continuity equation is also important in describing the flow of fluids, 
where it expresses the fact that the total mass of the fluid remains constant. 
The derivation is similar to that for concentration. ‘The total mass M ina 
volume V is an integral of the mass density p,,, which may be a function of 
both position and time: 


ve [ woe h (3.38) 


This is analogous to equation (3.20). All of the fluid particles in the neigh- 
bourhood of a given point are understood to be moving with a velocity v, 
which may also depend upon both position r and time t. Using arguments 
similar to those of Subsection 3.3.4, it can be shown that the mass of fluid 


3.4 The continuity equation 


passing through a surface element AA in time dt is 6M = p,,v- AAOt. A 
vector flux density for the flow of mass is defined, Jmass = PV, Such that 
the rate of change of the mass of fluid inside volume V is given by integrating 
this flux density over the surface S' which is the boundary of V: 


dM 
——— / dA + py. (3.39) 
dt - 


This is analogous to equation (3.21). The minus sign in equation (3.39) 
occurs because we have used the convention that dA points in the direction 
of the outward normal to the surface, and M decreases as mass flows in 
this direction. All of the arguments used to obtain the continuity equation 
apply equally well to equations (3.38) and (3.39). Replacing c by p,, and J 
by Jmass = Pv Shows that, for fluid flow, the continuity equation is of the 
form 


Pm +V-(p,v) = 0. (3.40) 


Exercise 3.12 


If a is a scalar field and F' is a vector field, show that 


V -(aF)=aV-F+F- Va. 


Exercise 3.13 

In most liquids, the mass density, p,,, can be treated as a constant (i.e. as indepen- 
dent of r and t). Show that the velocity v(r,t) of such a liquid satisfies V - v = 0. 
Exercise 3.14 

Consider the time-independent vector fields 

(a) J =zi+yj+ zk, 

(b) J = yi — 2j+ Ak, 

(c) J =(Axr+ By)i+ (Ca + Dy)j, 


where A, B, C and D are constants. In each case, evaluate V - J, and determine 
which of these vector fields can represent the flux density of conserved particles in 
a steady flow. [Hint: Use the continuity equation (3.37).| 


Show that in case (c) this is possible only if A+ D = 0. 


3.4.3 Relation with Gauss’s theorem 


The continuity equation is very closely related to Gauss’s theorem. We shall 
demonstrate this relationship by combining the continuity equation (3.37) 
with equation (3.18) to deduce Gauss’s theorem. 


Let S be a closed surface, containing a volume V. Using equation (3.7) and 
differentiating with respect to t, the rate of change of the number of particles 
inside V is 


dN Oc 


However, the rate, —dN/dt, at which particles are leaving the volume V 
is (by definition) equal to the flux, ®, across the surface (in the outward 
direction). Using equation (3.18) for the flux (with the convention that dA 


Gauss’s theorem was 
discussed in Block 0, 
Chapter 2. 


91 


92 Chapter 3 The diffusion equation 


points in the direction of the outward normal to the surface), and equating 
the result with equation (3.41), we have 


o= fda.s=-=- | av &. (3.42) 
he V 


Equating the two integrals in equation (3.42), and using the continuity equa- 
tion (3.37) to substitute for Oc/Ot, we have 


[aaa= [aves (3.43) 
S V 


This equation was obtained using physical arguments in which the vector 
field J(r,t) was interpreted as a flux density. ‘There is, however, no restric- 
tion on the choice of the vector field, and this equation must be true for any 
closed surface which is sufficiently smooth that the divergence of the vector 
field exists and the integrals are well-defined. We conclude that the follow- 
ing identity therefore holds for any closed surface S containing a volume V 
in which a differentiable vector field F’ is defined: 


[oap=[w-r, (3.44) 
S V 


This is Gauss’s theorem, a fundamental result in the theory of vector calcu- 
lus, which extends the concepts of differentiation and integration to vectors. 
It enables integrals over surfaces to be expressed as integrals over volumes, 
and vice versa. 


Exercise 3.15 
Consider the vector field F = (x + y”)i+ (2y? — x°)j. 


(a) Calculate the surface integral 


bs= | dA-F, 
S 


where S is the surface of a cube with its six faces lying in the planes x = 0, 
g=1,y= 8 ¢= 12 Se, ¢ = 1. 


(b) Calculate V - F, and the volume integral 
Oy = / dV Vet. 
V 


where V is the cube with surface S. Verify that Gauss’s theorem is satisfied. 


3.5 The diffusion equation 


3.5 The diffusion equation 


The diffusion equation can be obtained from the continuity equation (3.37), 
together with a simple assumption about the relationship between flux den- 
sity and concentration. This assumption is often known as Fick’s law. 


Fick’s law: The flux density is proportional to the concentration gra- 
dient, with flow proceeding from high to low concentration. In vector 
notation 


(3.45) 


where D is a positive constant known as the diffusion constant. 


Adolf E. Fick (1829-1901) was a German physiologist, best known in physiol- 
ogy for his work on cardiac output (1870), making possible the evaluation of 
respiratory exchange, i.e. the delivery of oxygen to bodily tissues. He is also 
credited with developing the first contact lens in 1887. He formulated his law 
of diffusion in 1855. 


Fick’s law was originally proposed as an empirical law. The negative sign 
is present because particles are expected to move against the concentration 
gradient, away from regions of higher concentration (see Figure 3.9). Fick’s 
law is a very natural guess for a relationship between the vector field J 
and the scalar field c, because taking the gradient is the simplest way to 
construct a vector field from a scalar field. 


J(r,t) c(¥r, t) = Co 


c(r, t) = cot+dc 


Figure 3.9 Illustrating Fick’s law: the flux density of diffusing particles J points 
from regions of higher concentration to regions of lower concentration, and is 
proportional to the concentration gradient. Here, dc > 0. 


To understand how this leads to the diffusion equation, consider first the 
one-dimensional case. Here the flux density is given by the one-dimensional 
version of Fick’s law 


J==-Dee (3.46) 


Substituting this into the one-dimensional version of the continuity equation 
(3.26) leads directly to the one-dimensional diffusion equation (3.1): 


Oc OJ O Oc 0*c 
oi a (-D=) = Do. (3.47) 


In three dimensions, combining Fick’s law and the continuity equation (3.37) 
gives 
Oc 


i =-V-J=-V-(-DVeo). (3.48) 


93 


An empirical law is one 
supported by experimental 
observation, rather than 
being deduced from 
fundamental principles. 


We used the fact that D is 
a constant in the final 
equality. 


94 Chapter 3 The diffusion equation 


The following holds for any scalar field f: 


Super) 50 [ary ame ara. 
V-(Vf)= 5 (=) i. (S*) ia (s*) =V’f. (3.49) 


Using this relation in equation (3.48), and noting that D is a constant coef- 
ficient, we obtain the three-dimensional diffusion equation (3.2) 


Oc 9 

a rye. (3.50) 
If the region in which diffusion occurs is finite, boundary conditions are re- 
quired, which describe the behaviour of the concentration c at the boundary 
of the region. These are determined by physical principles, rather than 
mathematical reasoning. The most usual case is that of diffusion in a finite 
region with an impermeable boundary (i.e. no particles are able to enter or 
leave the region). An example would be diffusion of a substance through 
a liquid contained in a glass beaker: the substance remains confined to the 
liquid. The mathematical expression of this fact is the statement that, for 
all positions on the boundary, there is no flux crossing the boundary: i.e. 
J(r)+n=0, where n is a unit vector normal to the surface, for all posi- 
tions r on the boundary, and for all times. Using Fick’s law, this condition 
is expressed in terms of the concentration: 


ne Vc=0 (3.51) 


for all points on the boundary and for all times ¢ larger than the initial time 
(which is usually taken to be zero). This boundary condition is known as a 
Neumann boundary condition. 


Another physical requirement on the solution of the diffusion equation is that 
as c represents a concentration of particles, this quantity must always remain 
non-negative. This constraint does not restrict the solution, in that if c(r, t) 
is initially everywhere non-negative, this property remains true of the solution 
at later times. This property, that non-negative initial conditions lead to non- 
negative solutions, will become apparent from a general solution given later, in 
Chapter 4. 


The derivation of the diffusion equation given in this chapter was based upon 
an assumption about the macroscopic behaviour of diffusing substances, 
namely Fick’s law. It is desirable to have a derivation based upon micro- 
scopic principles; this will be given in Chapter 6. 


Exercise 3.16 


Consider the function c(x,t) = exp(—a/t)/t, which represents concentration as a 
function of time, in the region x > 0. The particles are unable to enter the region 
x <0, so the flux density is zero at x = 0. Show that the number of particles is 
conserved. Use the continuity equation to deduce the flux density J(x,t). Does 
c(x,t) satisfy Fick’s law? 


Exercise 3.17 


Consider the scalar function f(r) = 1/r, where r is the distance from the origin. 
What is the gradient of this function? By considering the flux of this gradient vector 
(as defined by equation (3.18) with J = Vf) over the surface bounding two spheres 
of different radii centred on the origin, show that V?f = 0 everywhere except at 
r = 0. [Hint: Evaluating the integral in equation (3.18) is straightforward, because 
J +n is constant across the surface of a sphere. 


3.6 The heat equation 


Exercise 3.18 


Consider a three-dimensional region where the concentration at time t depends only 
upon the distance r from the origin, and is given by the function c(r,t). The vector 
flux density J must point in a radial direction, and at time ¢ it can depend only on 
r: let the scalar flux density in the radially outward direction be J(r,t). 


(a) By considering the flux of particles across the surface of a sphere of radius 
R centred on the origin, and relating this to the decrease of the number of 
particles within the sphere, show that J(r,t) can be expressed in terms of 
c(r,t) via the integral 


ie ie -3 |} és PZ (s,t) 


(b) Ifthe concentration obeys the diffusion equation (when expressed as a function 
of x, y, z and t), use Fick’s law to deduce that c(r,t) satisfies 


Oc L gf sae 
5 nag? x): 


3.6 The heat equation 


The diffusion equation was originally known as the heat equation. Let us 
now look at the connection. 


Heat is a form of energy, consisting of random microscopic motion of atoms. 
The temperature 6 of a material is a measure of the amount of heat energy 
per unit volume, i.e. the concentration of heat energy. In most substances, 
the temperature measured by a thermometer placed in contact with the 
substance may be approximated by a linear function of the concentration of 
heat energy q, which we write in the form 


1 
6 = 6 + ——(q@ — qo). aoe 

oF oe (q — go) (3.52) 
This is an approximate relationship, which is justified by experimental ob- 
servation and by physical theory. Here p,, is the density of the material, C’ 
is the mass-specific heat capacity of the material, and 69, go are two other 
constants. ‘his expression is a good approximation when g — go and @ — 6p 
are small. It is not necessary to be familiar with specific heat capacities; the 
important point is that 6 and gq are linearly related. 


Exercise 3.19 
(a) What is the SI unit of q? 


(b) Roughly speaking, the temperature in Kelvins is the temperature in centigrade 
plus 273.15, the latter number being chosen so that zero Kelvins is the lowest 
attainable temperature (‘absolute zero’). What is the SI unit of mass-specific 
heat capacity? 


95 


The SI unit of temperature 
is the Kelvin, symbol K. 


96 Chapter 3. The diffusion equation 


The amount of heat energy is conserved (in all of the processes that we shall 
consider), and q therefore obeys a continuity equation. 


Heat energy can move through a substance by thermal conduction, causing 
the temperature in different parts of the substance to change. You will 
become aware of thermal conduction if you hold a poker in a fire: eventually 
you will be forced to drop the poker when the heat of the fire reaches the end 
you are holding. Thermal conduction is closely related to diffusion: it can 
be thought of as diffusion of heat energy. The flux density of heat energy J 
is assumed to be proportional to the temperature gradient, so 


J =-—-KV0, (3:53) 


where the positive constant & is the thermal conductivity. This equation is 

known as Fourier’s law of heat conduction. Like Fick’s law, it was an em-_ It was proposed by J. B. 
pirical law, justified by its success in explaining experimental observations. Fourier in 1807. 

Note that the Fourier and Fick laws are of the same form, relating a flux 

density to a gradient. In Fourier’s law, the negative sign occurs because (in 

agreement with everyday experience) heat energy flows away from hotter 

regions, causing temperature differences to decrease. 


Equation (3.53) applies only in the case where the material is homogeneous 
(has the same properties at all positions) and isotropic (the material properties 
have no preferred direction). Most materials satisfy these criteria. 


Exceptions include materials with a chemical composition which depends upon 
position (which would be inhomogeneous), or materials with fibres aligned 
along a given direction, such as some carbon-fibre composites (which would 
be anisotropic). 


The derivation of equation (3.53) from fundamental laws of physics is prob- 
lematic, and for some apparently reasonable mathematical models of heat con- 
duction it has been shown to fail entirely. However, for most isotropic and 
homogeneous materials at moderate temperatures, Fourier’s law does give an 
accurate description of heat conduction. 


We can obtain the heat /diffusion equation from equations (3.52) and (3.53). 
We can differentiate equation (3.52) to relate the time derivative of temper- 
ature to that of q: 
oe. . dae 
ot Cpe OF 
Now the continuity equation for qg (i.e. equation (3.37) with c replaced by 
q), together with equation (3.53), gives 
Oq 
Ot 
Combining the previous two equations, we see that the temperature there- 
fore satisfies the equation 


(3.54) 


SPT = 0 VO, (3.55) 


Oo K 
—= vs, (3.56) 
ot - pe. 
which is known as the heat equation. It is identical in form to the diffusion 
equation. The constant «/C'p,, is called the thermal diffusivity. Thermal diffusivity has the 


same dimensions as a 
At the time when Fourier first formulated his law, scientists understood heat diffusion constant, and will 


to be an invisible fluid, called ‘caloric’. Temperature was thought of as being be given the same symbol 
the pressure of this fluid, and Fourier’s law then states that the rate of flow  D (it will be clear from 
of heat is proportional to the pressure gradient. This picture can be very context when we are 
useful in developing intuition about problems involving the flow of heat. The dealing with a thermal 
caloric model had to be abandoned when it was discovered that heat could be diffusivity). 

transformed into mechanical energy. The modern understanding is that heat 


3.7 A solution of the diffusion equation 


is a form of energy, in which the energy is contained in the random motion of 
atoms. 


Exercise 3.20 


(a) The density, mass-specific heat capacity and thermal conductivity of air (dry, at 
20°C and standard atmospheric pressure) are, respectively, p,, = 1.29kg mm, 
C=1.01 x 10? Jkg~'K~* and « = 2.40 x 107? Jim st! Kc w Whats. the 


thermal diffusivity, D, of air? 


(b) The diffusion constants (for molecular diffusion) for water vapour and car- 
bon dioxide in air are, respectively, Dy,o0 = 2.12 x ty" aa oo. = 
1.29 x 10-5 m?s7!. Does this suggest anything about the relationship between 


diffusion of heat and diffusion of molecules? 


3.7 A solution of the diffusion 
equation 


One particularly instructive solution of the diffusion equation describes what 
happens when N particles are initially concentrated at a single point, and 
then diffuse in an infinite region. We shall consider only the one-dimensional 
case for the moment; extension to two and three dimensions is considered 
briefly in Exercise 3.24, and in much more detail in Chapter 4. A suitable 
solution of the one-dimensional diffusion equation (3.1), with c replacing f, 
is a Gaussian function of the form 


oe, t) = exp(—2”/4Dt). (3.57) 


N 
V4A4nrDt 
It can be verified by substitution that this is a solution of the diffusion 
equation (3.1) when t > 0 from above. As t — 0, the solution is concentrated 
at x = 0 (in the sense that for any point other than x = 0, the solution 
approaches zero as t > 0). This solution is illustrated in Figure 3.10. 


c(x,f)/N 1.0 


0.8 


0.6 


Figure 3.10 A Gaussian solution of the one-dimensional diffusion equation, in 
which N particles are initially all at « = 0. In this figure, we set D = 1. 


97 


98 Chapter 3 The diffusion equation 


Exercise 3.21 


Check (by substitution) that (3.57) satisfies the diffusion equation (3.1) (with f 
replaced by c) when t > 0. 


Exercise 3.22 


If c(x, t) satisfies the diffusion equation, show that c(x — 29,t — to) also satisfies the 
diffusion equation (where 2 and to are real constants). This means that solutions 
can be translated (moved) in both space and time. 


Because the diffusion equation is linear, any constant multiple of a solution 
is also a solution. So, if we divide c(r,t) by the total number of particles N 
to define 


pol? tT) =e ae (3.58) 


then p(r,t) will also be a solution to the same diffusion equation. We see 
from equation (3.7) that such a solution satisfies 


/ aV ptr,t) = 1, (3.59) 
. 


where V is the full volume containing all N particles. Any solution of the 
diffusion equation with this property will be called normalised. We can 
interpret a normalised solution as giving the probability density p(r,t) for 
a particle to be located in the vicinity of r at time t (hence our use of the 
symbol p). The concentration is obtained by multiplying the normalised 
solution by the total number of particles present, NV. 


Thus, from equations (3.57) and (3.58), the normalised solution in one di- 
mension is 


1 2 
21) = exp(—2x*/4Dt), 3.60 
pla, t) = = exp(—2?/4D1) (3.60) 
which can be treated as a probability density. In the following exercise, we 
consider one of its moments. 


Exercise 3.23 


Confirm by direct integration that equation (3.60) is normalised, and that it satisfies 


/ de x* p(t) = Se. 


Given that a normalised solution may be interpreted as a probability density, 
this exercise shows that the mean-squared distance of particles from their 
starting point is proportional to time, i.e. 


ic") = opr. (3.61) 


This should be compared with equation (2.18), which is a characteristic 
property of random walks. This is evidence that the underlying physical 
mechanism for diffusion is that the particles make random walks. In Chap- 
ter 6, the diffusion equation will be derived from the random walks exhibited 
by individual particles. 


3.8 Summary and outcomes 


Exercise 3.24 


(a) Show that a solution of the two-dimensional diffusion equation can be obtained 
as a product of two solutions of the one-dimensional equation. 


(b) Use this approach to obtain the solution of the two-dimensional diffusion equa- 
tion obtained in Exercise 3.2: 


# 
p(x,y,t) = — exp[—(2" + y°)/4Dt). 
Determine the constant C’ which makes this a normalised solution. 


(c) For the normalised solution in part (b), what is the mean-squared value of 
the distance to which the particles diffuse from the origin, (r*), where r = 
\/ x2 + y?? [Hint: You can use polar coordinates for the two-dimensional inte- 
eral, but it is easier if you note that the double integral factorises.| 


(d) What do you think (r?) will be in three dimensions? (In this case r = 


\/ x? + y? + 22.) 


3.8 Summary and outcomes 


The principle ideas discussed in this chapter can be summarised as follows. 
The distribution in space of the quantity of a substance is described by its 
concentration c, and the movement of substances is described by the flux 
density J. We showed that if the substance is not created or destroyed (that 
is, if it is conserved), then c and J are related by the continuity equation. We 
considered a particular type of particle-conserving process, called diffusion, 
in which substances move due to the random movement of individual atoms. 
We showed that a natural assumption about the flux density in diffusion, 
namely Fick’s law, leads to the diffusion equation. We found that diffusion 
from a point source has the property that the mean-squared distance trav- 
elled is proportional to time, which indicates a connection between random 
walks and diffusion. The same concepts are applicable to heat flow, and the 
diffusion equation is also known as the heat equation. 


After reading this chapter, you should: 

e be aware of the macroscopic signs of diffusion, and of its microscopic 
physical mechanism; 

e understand the definitions of the concentration c(r,t) and the flux den- 
sity J(r,t) — you should be able use these quantities to calculate the 
number of particles in a volume and the flux across a surface; 

e be able to calculate simple surface and volume integrals, and to relate 
these using Gauss’s theorem where appropriate; 

e understand the continuity equation, and its role in deriving the diffusion 
equation; 

e understand Fick’s law of diffusion; 

e understand the relation between the diffusion equation and the flow of 
heat; 

e be aware of the Gaussian solution of the diffusion equation, and of its 
connections with the probability distribution for a random walk. 


99 


100 Chapter 3 The diffusion equation 


3.9 Further Exercises 


These are harder exercises, which will challenge your understanding of the 
material. 


Exercise 3.25 


Assuming particle conservation, what is the boundary condition on the concen- 
tration at the smooth interface of two regions, numbered 1 and 2, with different 
diffusion constants, D; and D2? 


[Hint: The flux density across the boundary is the same on either side. Express the 
flux density on either side in terms of Fick’s law, in order to give a condition which 
must be satisfied by the concentration c.] 


Exercise 3.26 


Two layers of material are bonded together. Their thicknesses and thermal conduc- 
tivities are, respectively, L,, «; for material 1, and L2, K2 for material 2. The double 
layer is used to separate two fluids, one at temperature 6,, the other at 62, with 
material 1 in contact with the liquid at temperature 6,. What is the temperature 
0; of the interface between the materials when a steady flow of heat has been estab- 
lished? What is the flux density of heat energy? [Hint: Consider one-dimensional 
heat flow in the direction perpendicular to the boundaries of the layers.| 


Figure 3.11 Illustrating Exercise 3.26: two layers of solid material separate two 
liquids at temperatures 9; and 65 


Exercise 3.27 


(a) Pipes carrying hot water are often surrounded by a layer of insulation in order 
to minimise heat loss. Consider a layer of insulation around a pipe, occupying 
the region bounded by two cylinders, with inner radius r; and outer radius ro, 
which are kept at constant temperatures 6; and 02, respectively. (The cross- 
section is shown in Figure 3.12.) Assuming steady heat flow, show that the 
temperature at radius r (with ry <r < 12) is 


6(r) = 0, + aoe (=) , 


(b) What is the flux of heat per unit length of pipe? Does doubling the thickness 
of insulation halve the rate at which heat is lost? 


3.9 Further Exercises 


9» 


Figure 3.12 Illustrating Exercise 3.27: a pipe is surrounded by a layer of 
insulation 


[Hints: This problem has two simplifying features: it involves a steady state, 
and there is a high degree of symmetry, so the temperature depends upon only 
one coordinate, r. ) 


Consider a thin cylindrical shell between radii r and r+ or. Because the 
temperature is steady, 00/0t = 0, and the flux of heat entering the inner surface 
ar r is balanced by the flux leaving the shell at r+ dr. The heat flux across 
any cylindrical surface is therefore constant. ‘This flux is proportional to the 
temperature gradient, d0/dr, and to the area, which is proportional to r, so 
r d0/dr = constant.| 


Exercise 3.28 


Consider the case where particles undergo diffusion in a fluid which is itself in 
motion with velocity v(r,t). (An example is where a pollutant enters a water 
system: it spreads by diffusion and also because it is being carried along by the 
flow.) Assume that the flux density due to the flow of the fluid can be added to 
that which results from diffusion. Show that the flux density due to the fluid flow 
is Jaow = cv, where c is the concentration. Using the continuity equation, show 
that the equation for c is 

wl = DV*c—v-Vc-—(V-v)c. 


This is sometimes called the diffuston—advection equation. 


Exercise 3.29 


(a) Consider the one-dimensional diffusion—advection equation where the fluid flow 
is steady, so that the velocity v(a) does not depend upon t. Consider the case 
where the particles are confined to the region x > 0, and the concentration 
c(x) approaches zero as x — oo. Show that a steady-state solution, where the 
concentration does not depend upon t, takes the form 


See E / a 1) | 


where A is a constant. 


Go 


If the fluid contains N particles, how might the constant A be determined? 


yc 
@ 
eee 


What is the form of the solution in the cases where (i) v(a) = —A and (ii) 
v(x) = —yx (where \ and p are constants)? 


101 


102 Chapter 3 The diffusion equation 


3.10 Appendix: observing diffusion in 
a simple experiment 


If you would like to try an experiment to see diffusion for yourself, it is quite 
easily observed using a dye (such as food colouring) dissolved in water. It 
is important to avoid any macroscopic motion of the water. Figure 3.13 
shows some photographs illustrating one possible approach to doing this 
experiment. 


First fill a glass of water to the brim, and place an inverted saucer on top 
of it, as shown in Figure 3.13(a). Turn the glass upside down, then add 
water to the saucer and allow it to settle for half an hour. Add vegetable 
dye to the saucer, and finally lift the glass a tiny fraction to allow the dye to 
seep under the lip of the glass. This step is shown in Figure 3.13(b). Over 
a period of several days, the dye will diffuse into the glass of water. This 
is shown in the sequence of photographs in Figure 3.13(c), the initial state 


being the left-most picture. The photographs and 
experimental procedure 
were kindly supplied by 
Dr Graham Read. 


Figure 3.13 An experiment demonstrating diffusion. 


Solutions to Exercises in Chapter 3 


Solutions to Exercises in Chapter 3 


Solution 3.1 
(a) V*7f =24+2+2=6 
ae ey 


je PH se eS a (SP Se 


Of 2542 4 ye ST ee ee ee 2g Se 
et + 7 ae a ee 
a a Ca ee eae +80 arabe) “(ppt eee 
Similarly, 
Oe 2 2 kh ee 
Oy? me (a? + y? 4 22)5/2? 3 (a? + y? + z2)5/2 
sO 
yep _ ev = ee ees Selig 
= (a? + y2 + 22)5/2 aaah 
Solution 3.2 
(a) The diffusion equation in two dimensions is 
dp _ [de , & 
Ot Ox? Oy? | 
(b) Consider the trial solution p(x, y,t) = A(t) exp[—G(a? + y?)/t]. We find 
) r+ y? 
2b = A'(t)exp[—Ala? + y?)/t] + =F aie expl-ale? + ¥?)/4 
O 20x 
ae = A(t) —— exp[—-Bla? + y?)/t], 
Op — —2BA(t AF? a” 
58 = OD expl—a(a? + 9?)/f] + “= A(t) expl—A(a? + ¥?)/4 


The expression for 07/Oy? is obtained by exchanging x and y in the expression 
for 0?p/Ox?. Substituting into the diffusion equation and cancelling the factor 


exp[—((a* + y*)/t] gives 


/ 
A’(t) + : 72 


This equation must be satisfied for all x, y and t. For any given value of t, the 
terms which are independent of x and y on either side of this equation must be 
equal. Similarly, the terms proportional to (x? + y?)/t? must be equal, giving 
GA(t) = 4D6" A(t), so G = 1/4D. Equating the terms that are independent of 


x and y gives 


A'(t) = -—— 


which is a differential equation to be solved for A(t). It may be integrated 
to give In A = —Int+ constant (for A > 0 and t > 0). Exponentiating gives 
A = C/t for some arbitrary constant C. The diffusion equation therefore has 


a solution of the form 


p(o.y,t) =< exp[-(a? + y?)/4D4} 


103 


104 Chapter 3 


Solution 3.3 


In one dimension the concentration must be defined as the number of particles per 
unit length. 


For a uniform distribution of N particles over an interval of length L, the concen- 
tration is c= N/L. 


When the distribution of particles is non-uniform, the concentration c(zx,t) at po- 
sition x and time ¢ can be defined in terms of the number of particles AN(z, Az, t) 
in a short interval between x — 5 Ax and x + 5Ag, divided by the length of this 
interval: 


AN ge, Ge, t) 
ANG 


If the quantity c(x,t) is almost independent of Az for a wide range of values of Az, 
the concentration is well-defined. 


et = 


Similarly, in two dimensions the concentration is the number of particles per unit 
area. 


If we interpret people as ‘particles’, population densities quoted in geographical 
atlases are concentrations of people. Thus, if there are AN people living inside a 
square region of side Az, the population density is c= AN/((Az)?). The length 
Az must be chosen appropriately; for example, when estimating the population 
density in an urban area, it might be appropriate to take Az = 1km, but not 
ia =10 im. 


Solution 3.4 


Using the one-dimensional version of equation (3.7), we have 


[ dx co + C1 COS e exp(—at) 


a+ ae (—at) si (=)} 
C — exp(—a a = 
. 1 = iL 


Lsi L 
Be eos el ae 
aT 


N(a) 


0 


When a = L, we find N(L) = coL, which is independent of time. 


Solution 3.5 


The volume is 500 ml = 0.51 = 0.5 x 1072 m®. The concentration is therefore 
1022 

c= ————— 

0.5 x 10-3 

= 2 x 10° molecules per m 


3 


molecules per m 


3 


= 2 x 10°" molecules per litre 
7x te 

~ 6.022 x 1023 

o.g2 <x mae | 


mols per litre 


The concentration can be expressed as either 2 x 1022 m~® or 3.32 x 1074 moll7!. 


Solution 3.6 
Using the fact that 1m° = 10°1, we have 

250 wg l-* = 250 x 10-8 gl~* = 250 x 107° kg17+ = 250 x 107-8 ke m7. 
The mass of a carbon dioxide molecule is 7.31 x 10~?° kg, so 


250 x 107 


ber of molecul * of carbon diccide == — a aa 10 
number of molecules per m* of carbon dioxide 731 x 10=26 ~*~ 
Similarly, 
. 2x iy is 
number of molecules per m” of carbon monoxide = ——————— & 4.30 x 107”. 


4.65 x 10-26 


The diffusion equation 


Solutions to Exercises in Chapter 3 


Solution 3.7 


The flux of fish is 600 per day. To express this in terms of the SI unit of time, we 
note that one day is 60 x 60 x 24 = 86400 seconds. In the appropriate SI unit, the 
flux is ® = 600/86 400 ~ 0.0069s~?. 


The cross-sectional area of the river is d = 5 x 1 =5m/?, so the flux density is 
J=@/A ~0.0060/5m "ss ~000l4m-~*s *. 


Solution 3.8 


The approximate number of hailstones collected is n = 100/0.05 = 2000. The radius 
of the pan is r = 0.1m, so the area is A = mr? = 7 x (0.1)? ~ 0.0314 m?. The time 
is t= 5 x 60 = 300s. So the flux density is 

n 2000 


J = — = —_.. —_ © 212m’ s? . 
At 7x (0.1)? x 300 ae 


Solution 3.9 


(a) Using equation (3.20), the number of particles is 


N= | a (r,t) 
V 
1 1 1 
=| a | dy [ dz fl + 4a? + tay?| 
0 0 0 
2 


1 
= f de (1+ ho? + 42 [Bv4]5) 
0) 


1 
=| dx b+ 4a? + 52) 
0 


[a+ xo" + 392°] 
129 

120° 

This result is not a large number, so if the concentration is defined as the 
number of particles per unit volume, it is meaningless. However, if the con- 
centration is measured in mols per unit volume, this figure represents approx- 
imately 6.02 x 107° x 129/120 particles, which is a large number. In this case 
the number of particles within the cube is likely to be similar in magnitude to 
the calculated result, so the answer is meaningful. 


(b) The flux is calculated using equation (3.21). Let J = Jji+ Joj + J3k. Because 
the normal to the surface is aligned with the unit vector i, we have J-dA = 
J, dy dz. ‘The integral is over the square region x = 1,0<y<1,0<2z<1,s0 
the flux is 


b= | dA- drt 


rae wee Se eae 


= dy dz (2 ++ a2 + 5) 
1 1 
oe dy dz (< + 52) 
0 
Se 


105 


106 Chapter 3 


Solution 3.10 


The volume of the region bounded between two concentric spheres of radii r and 
r+ or is 


6y = Sm(r + dr)? — smr° = 4rr*d5r + O((5r)?). 
The contribution to the number of particles from this spherical shell is c(r) dV. 
Integrating over r, the number of particles inside a sphere of radius R is 


R 
Ni = an f drr*c(r). 


When c(r) = 1+ 1/r, this is 


R 
1 
N(R) = ar [ dr r? (: + *) = do [r3 4 bp?) = 43 + OR? 


This result could also have been obtained by working in spherical polar coordinates 
and integrating over the angle variables. 
Solution 3.11 


The area of a spherical surface of radius r = R is 47 R?. The flux density vector is 
always aligned with the outward normal, and has uniform magnitude J(R) across 
this surface. The flux across this spherical surface is therefore 


®(R) = 4rR?J(R). 
Clearly, if J = C/R? for some constant C, then the flux & = 47C is independent 
of A. 
Solution 3.12 
if = Fyi =o Fj + Fk, then 
O(aF O(aF: O(aF: 
(al) Cet) Os) 


ae A Ox Oy Oz 
OF, Oa OF > Oa OF 3 Oa 
ee ee Ce ee 
oor | Gy Ie og? 1 egeeee a 33z 


=aV-F+F-Va. 


Solution 3.13 
The continuity equation for the liquid is 


OP in 
Ot 


Since p,, is constant, we have Op,,/0t = 0 and Vp,, = 0. Using the result of the 
previous exercise, 


+V + (ppv) = 0. 


Vee pag VO. 


The second term in this equation is zero because p,, is constant. The continuity 
equation then reduces to 


V-o =, 


The diffusion equation 


Solutions to Exercises in Chapter 3 


Solution 3.14 


Because the flow is steady, Oc/Ot = 0, and the continuity equation becomes V- J = 
0. 


(a) V- J =1+1+4+1=3. This vector field cannot be the flux density in a steady 
flow. 


(b) V- J =0+0+0=0. This vector field can be the flux density of a steady 
flow. 


(c) V- J =A+D. This vector field can be the flux density of a steady flow only 
ifA+D=0. 


Solution 3.15 


(a) For « =0, the outward normal is n = —i. The vector field at x = 0 is F = 
y7i + 2y*j, so F -n = —y?. The surface integral is 


1 1 1 : 1 
; iA-F = | iy | i: Fon= | iy | dz (44) =- | dy y* = —4. 
2,=0 8) O 0 0 O 


For z = 1, the outward normal is n =i, and F = (1+ y’)i+ (2y? — 1)j, so 
F-n=14y? and 


1 1 1 
/ iA-F = | ay | dz (1+y%)= | dy (1+ y7) = %. 
c=1 0 0 0 


For y = 0, the outward normal is n = —j, and F = zi— x°j,so F-n = x? and 


| | 
/ dA-F= | av | ae > 2. 
y= 0 0 


For y = 1, the outward normal is n = j, and F = (1+ 2z)i+ (2 — x°)j, so 
F-n=2-—~2° and 


1 1 
/ iA-F= | ax | dz (2-2°)=f,. 
at 0 0 


For z = 0, the outward normal is n = —k and F-n = 0. The integral over the 
surface z = 0 is therefore zero. Similarly, the integral over z = 1 vanishes. 


The integral over the surface S' of the cube is therefore 
bs= | dA-F=-34$+}+34040=3 
Ss 
(b) We have 
O O O 
V-F=— *) + —(2y? — x) + —(0) =14 4y. 
se tu) + 5 y?— 29) + £0) =144y 


The volume integral of this divergence is 


by =f wv v-F 
V 


1 1 1 
mf ax | ay | dz (1+ 4y) 
0 0 0 
1 1 1 f 
=| ax | dy +4y) = [ dy (Apel ty aes. 
0 0 0 


Thus ®y = ®g, and Gauss’s theorem is satisfied. 


107 


108 Chapter 3. The diffusion equation 


Solution 3.16 


The number of particles is 


OO 1 CO 
N =| de et) = ; | dx exp(—a/t) = 1. 
0 0 


(Because this is not a large number, we must assume that N is measured in multiples 
of some large number, such as the mol.) Hence N is constant, so the number of 
particles is conserved. 


Because the number of particles is conserved (that is, N is independent of time), 
we are justified in using the continuity equation: we find 
Ose, f) CoC.) a 
= ne SS oe (OPM go 


We have J(0,t) = 0, because the particles cannot cross x = 0. Integrating gives 


eG, = a du a(t — u) exp(—u/t) 


where the lower limit of integration ensures that the condition J(0,t) = 0 is satisfied. 
Using integration by parts, this gives 


e.t)=|@=9) (3) ew-w/)] ~ 3 [ avew(-v/t) 


Oi 


= nt exp(—a/t) + : 7 - lexp(—y/t)]o 


= ries a 2) 


This is clearly not proportional to Oc/Or = — exp(—2/t)/t?, so Fick’s law is not 
satisfied for any choice of diffusion constant D. 


Given that c(x,t) is a concentration, it must represent the concentration of a ma- 
terial which is not moving by a process of diffusion. 


Solution 3.17 


From the definition of the gradient vector discussed in Block 0, Chapter 2, the 
gradient of the function f(r) is f’(r)r, where f’(r) is the derivative of f andr isa 
unit vector pointing radially outwards. If f(r) = 1/r, the gradient is 


V¥S_ =e 
r 

You have seen in exercise 3.11 that the flux of this vector field across a spherical 
surface of radius R centred on the origin is independent of R. The surface integral 
of Vf over a spherical surface of radius R; is therefore the same as that over a 
surface of radius Ro. Applying Gauss’s theorem, the difference between these two 
surface integrals is the volume integral of the divergence of the vector field Vf over 
the region between radii R; and Rz. (The difference enters here because dA points 
in the direction of the outward normal to the bounding surface, so it points away 
from the origin for the outer sphere but towards the origin for the inner sphere.) 
The integral of the divergence of the vector field, i.e. of V7 f, over this region is zero. 
Because of the spherical symmetry, V?f can depend only upon the distance from 
the origin. Allowing Rz to approach R,, and noting that V?f must be a continuous 
function of r, we see that V* = 0 at R, in order for the integral to equal zero. We 
therefore must have V?f = 0 for all points on the spherical surface at R,. Because 
R, is arbitrary, this argument establishes that V7 f = 0 everywhere except at the 
origin, 7 = G, 


Solutions to Exercises in Chapter 3 


Solution 3.18 


(a) Because of the spherical symmetry of the problem, the flux density on a spher- 
ical surface at R takes the same value J(R,t) everywhere on that surface. The 
flux ®(R) across this surface is then the flux density multiplied by the area 
of this surface: ®(R) = 47 R?J(R,t). Because the total number of particles is 
conserved, the outward flux across this surface is equal to the decrease in the 
number of particles within this surface, which is obtained by differentiating 
equation (3.7) with respect to time. Thus, we obtain 


R 
®(R) =4ahJ(R,1) = ~= | dr 4nr? c(r, 7 
0 
since 
R 
NE t= / dr Arr? c(r, t) 
0 


is the total number of particles, N(R,t), inside the sphere. The flux density is 
therefore obtained from the concentration via 


1 ie 
Jed) = -ai ds PZ (s,t) 


after a change in variable names (R to r and r to s inside the integral). 


(b) If the concentration satisfies the diffusion equation, the flux density J must 
(according to Fick’s law) be proportional to the gradient of the concentration. 
The magnitude of the gradient of the concentration in the radial direction is 
just Oc/Or. We therefore have 


Oc on 5 OC 
~Dx (r,t) a -3f ds s° —(s,t). 


Multiplying both sides by —r?, then taking the derivative of both sides with 
respect to r, and finally dividing both sides by r’, gives the result quoted in 
the exercise. 


Solution 3.19 


(a) The SI unit of energy is the Joule, where 1J = 1kgm’s~?, so the SI unit of 


energy per unit volume gq is therefore Jm7°. 


(b) From equation (3.52), the SI unit of mass-specific heat capacity C is the unit 
of (¢q — qo)/Pm (8 — 9). Since the SI unit of temperature is the Kelvin, and 
those of g and p,, are Jm~° and kg m° respectively, the SI unit for C is 
Jm™°/kgm °K = Jkg ' K7?. 


Solution 3.20 
(a) The thermal diffusivity of air is 

c 4x5 
Cp. 1.29x 1.01 x 102 


m 


Dis 1 Sx 1m 6 

(b) The thermal diffusivity D calculated in part (a) is similar in magnitude to the 
diffusion constants for gases diffusing through air quoted in the exercise. The 
similarity of these values indicates that the diffusion of heat energy has the 
same origin as the diffusion of materials: heat is transported by the random 
microscopic motion of atoms. 


109 


110 Chapter 3. The diffusion equation 


Solution 3.21 


i 
N 2 
C= exp(—2* /4Dt), 
aoe 
then 
Oc N 2 ne N 2 
— = ——————_ exp(— 277 /4Dt) + —=—— exp(—2* /4Dr), 
Ot 2V/ 4D t3/2 eas ADt? /4rDt Bis wa ata 
Oc =—giV 2 
— = — —— exp(—27* /4 D1), 
Ox 2DtVW4rDt P( / 
O*c N 9 a 
Seg SSE ee ee may Se fA a ee 
Ox? 2Dt/4rDt P( / ) 4AD?t?,/4nrDt Pt / 
_ 1d 
ee 


so that equation (3.1) is satisfied. 


Solution 3.22 
Consider the function c;(x,t) = c(a — 20, t — to). We have 


Oc, 

Ot 
and analogous relations hold for the first and second derivatives with respect to 
x. If c satisfies the diffusion equation at position x — xo and time t — to, then c; 
satisfies the diffusion equation at position x and time t. It was assumed that c 
satisfies the diffusion equation for all positions and times. It follows that c;(2, t) 
also satisfies the diffusion equation for all positions and times. 


O 
(et) = Oo — Xo,t — to), 


Solution 3.23 | 


To check normalisation, we calculate 


CO 1 CO 
/ dx p(x,t) = = | da exp(—2* /4Dt) We used the change of 
—_ ne poe variable u = x/V4Dt, and 
- = / du exp(—u”) = 1. applied the standard 
fF Foe Gaussian integral, 
discussed in Block 0, 
The second moment is given by Chapters 1 and 2. 
2 <— / = 2 2 
= Ge 2 st) a, aan dz x° exp(—x* /4Dt) 


en Again, the change of 
a — du u? exp(—u?) = 4Dt Vm = Se variable is u = «/V4Dt, 
yt Se Vn 2 and we used the integral I 
defined in Section 1.3. 
Solution 3.24 


(a) Given two functions f(x,t) and g(z,t), both of which are solutions of the 
one-dimensional diffusion equation, the function 


c(x,y,t) = f (x,t) gly, t) 


is a solution of the two-dimensional diffusion equation. This is verified as 


follows: 
Oc Of (x,t) Og(y, t) The first ste h 
haan t t p uses the 
Ot Ot gy, t) + Fla,t) Ot product rule for 
eo i2,0 O° g(y, t) differentiation; the second 
=D <2 g(y,t) + Df (a, t) Ay? uses the fact that both f 
. and g satisfy the 
= eee one-dimensional diffusion 
Ox? Oy? equation. 


Solutions to Exercises in Chapter 3 


(b) Consider the case where g = f in the notation of part (a), so pi opy, bp 
f(z, t)f(y,t). If f(v,t) is a normalised solution of the diffusion equation in 
one dimension, then 


[a [a oa fa fet) x fay Flu.t) a eae 


so the two-dimensional solution is automatically normalised. From equa- 
tion (3.60) and Exercise 3.23, we have a normalised one-dimensional solu- 
tion of the form f(#,t) = exp(—x?/4Dt)/V/4rDt, and this leads to the two- 
dimensional solution quoted in the exercise with the constant C given by 
1/4rD. 


(c) The mean value of r? = x? 4+ y? is 


[wf +e )olent)= fda 2 f(a.t) fay f(y,t) 


+f dx rast) [dy yf (y,t) 


—OO — 


= 21x te 
= 2h. 


The final step uses the fact that the second moment of the function f(z, t) 
used in part (b) is 2Dt (see Exercise 3.23). 


(d) In three dimensions, the same approach works. It turns out that (r?) = 6Dt 
in three dimensions (as might be expected by extrapolation from the one- and 
two-dimensional cases). 


Solution 3.25 


The flux densities in regions 1 and 2 are respectively J; = —D,Vc, and Jo = 
—D2Vc2, where c; and cg are the concentrations in each region. Let n be the 
normal to the interface, pointing from region 1 to region 2. Because particles are 
neither created nor destroyed at the interface, the flux density of particles crossing 
the interface on either side must be the same. The flux density of particles striking 
the interface from region | is J) = J, +n. The flux density Jo in region 2 is obtained 
in the same way. Equating these gives J; -n = Jo-n, so 


(Di Vc = D2V cz) <1 = (. 


This condition holds for all points on the interface, and for all times. 


Solution 3.26 


We assume one-dimensional flow in the x direction (chosen to be perpendicular 
to the boundaries), so the temperature 6 satisfies 06/0t = D 070/0x7, where D = 
K/Cp,,. The origin is taken to be at the interface between the layers. Because 
the flow is steady, 00/0t = 0, so 070/0x? = 0. The temperature in layer j is thus 
0;(z) = Ajx + B;, for some constants A; and B; which may take different values 
in different layers. The flux density of heat is J = —K,00;/Ox = —K;Aj;, which is 
the same in each layer. The temperature difference across layer j is AO; = A;L; = 
—JL;/«;. (Here, AQ; = 0, — 0, and Adz = 62 — 6;.) 


The overall temperature difference across both layers is AO; + As. We are told 
that this difference is 02 — 6), so 


Ly Lo 2 nite ta a Lok 
Ky iK2 


2 —0, = —J 


and the heat flux density is thus 


= (0; ae Og) Ki K2 
Dyk2 + Lok, 


112 Chapter 3 


The temperature difference across the first layer is then A@Q; = —JLI,/K1. The 
interface temperature is 6; = 0; + AG, i.e. 

(02 — 0) Like 

Lyko + Lok 


Solution 3.27 


(a) As discussed in the question (see the Hint), the heat flux is the same across all 
cylindrical surfaces. The outward heat flux ® across a surface of area A due 
to a temperature gradient d@/dr is 


dé 
® = —KA—. 
' dr 
In this case the area of the curved part of a cylindrical surface is 27rL, where 
L is the length of the cylinder, and r is its radius. The heat flux is then 


OG, = 0, + 


dg 
® = —27KrL— = constant. 


dr 


This is a differential equation for @(r), which gives 


/ d0 = —(®/2nKL) / dr [r. 


Performing the integrations, we obtain 


P 
te a a Inr + 6, 


for some constant 99. This solution contains two unknown constants, 99 and 
®, which are determined by applying the boundary conditions that the tem- 
peratures are 6;, #5 at radii r; and ro, respectively. 


Writing 09 = 6; + (®/27KL) Inr,, the solution reads 


D $ 
O(r) = §, ei at (=) . 


This is seen to satisfy the condition 0(r1) = 0;. If the heat flux ® is now chosen 
so that 
0) pe 0, _ Ao 
QnKL ~~ In(ro/r1)’ 
the condition @(r2) = 2 is also satisfied, and @(r) has the form quoted in the 
exercise. 


(b) The heat flux per unit length of pipe is 
h  2rK(01 — 42) 
L - In(r2 / r1) , 
Because the logarithmic function increases very slowly as a function of r2/r1, 
this result shows that it is difficult to reduce heat loss significantly by increasing 


the thickness of insulation. Doubling the thickness of insulation does not halve 
the heat loss (except in the limit as r2/r, — 1). 


Solution 3.28 


In time ot, the volume of fluid passing through an element of area AA with velocity 
vis dV =v-AAOt. If the— concentration of particles is c, the number of particles 
in this volume is DVN = cdV. But, from the definition of flux density, dN = Jaow - 
AA ot. Since this should hold for any choice of AA, it follows that Jaow = cv. 


This so-called advective contribution to the flux density must be added to the 
diffusive flux density given by Fick’s law. The total flux density is therefore J = 
vc — D Vc. Applying the continuity equation and using the result of Exercise 3.12 
gives 
OC. 
Ot 


which is equivalent to the equation quoted in the exercise. 


—V-|ve—DVcl =DV-Ve-—v-Vec-(V-v)ec, 


The diffusion equation 


Solutions to Exercises in Chapter 3 113 


Solution 3.29 


(a) For a steady-state solution, the concentration c is a function of position x only. 
The diffusion—advection equation in one dimension may then be written 


0 = = aeto : De | | 


so u(x)c(x) — Dc'(x) = C, where C is a constant. If c approaches zero as 
© — oo, we expect that its derivative c’(x) also approaches zero as x — oo. 
This indicates that we must have C'=0. So the differential equation to be 


solved is 
de _ v(x)c(x) 
oe 


which can be solved by, for example, separation of variables: 


[t= fog 


giving 


In c(a) + constant = if dy vy) 
0 D 


Exponentiating this last equation gives the required solution: 


1 i 
c(z) = Aexp =f dy vw) 
D Jo 
for some constant A. 


(b) Using equation (3.7), N and A are related by 


n= [a e(a) =A fda aa Sf wow), 


Thus A can be determined from an integral. 


(c) (i) For v(x) = —A, since 


/ roe pee / = ae 
O O 


we have 


c(z) = Aexp(—Az/D). 


(ii) For v(x) = —px, since 
/ dy v(y) = -u | dy y = —5ue", 
0 0 
we have 


e(x) = Aexp(—px*/2D). 


= 
ae. 


| | a. 
“alge + abiahaoe sd val jandiw ai 4 wolterice sont oat 3 


wee 


a 


65 ‘ek A 


oO tebe 2 = fe “ee 
a as we a hed fenaes SF 


ae . 


“Seg 


—— 
- ac 


= 


= 


= 


we Oe 


aes _ aga : _ 
sibs pa us et. 


a 


eee 
Cie 


as 
ir 


i 
a ae 


3 


atte 


CHAPTER 4 


Solutions of the diffusion 
equation 


4.1 Introduction 


In this chapter we survey many of the methods used to find solutions of the 
diffusion equation, 


a 
— = DVe. (4.1) 


We look at solutions for various types of boundary and initial conditions. 
The physical context in which each of the solutions arises is described. In 
some cases, it is more natural to think of the function c(r,t) as representing 
a temperature rather than a concentration, and in these cases we replace 
the symbol c by @. 


We show in detail how to construct solutions of the diffusion equation satis- 
fying appropriate boundary conditions, but we shall not discuss why these 
solutions are unique. However, if the mathematical representation of a phys- 
ical problem is correctly formulated, the solution is expected to be unique, 
so the question of uniqueness need not concern us. Also, some solutions are 
obtained as infinite series, but the convergence of these series is not discussed 
in any detail. 


The diffusion equation and the wave equation are both linear partial dif- 
ferential equations, and many of the techniques of solution are common to 
both equations. Separation of variables and Fourier methods are often very 
useful in both cases; other techniques are also used, and many of the meth- 
ods may be familiar from Block I. This chapter is, however, self-contained, 
apart from making frequent reference to Chapter 3 of Block I, which cov- 
ers Fourier series and Fourier transforms. You may find it useful to briefly 
review that chapter before continuing; the sections on the convolution the- 
orem (Section 3.5) and on the Fourier transform of derivatives (Section 3.6) 
are particularly relevant. 


The discussion is organised according to the nature of the boundaries and 
the number of space dimensions of the ‘medium’ (the material body) within 
which the diffusion equation is satisfied. We consider successively an infinite 
medium, a finite medium and a semi-infinite medium. ‘The first two of 
these are self-explanatory; a semi-infinite medium is obtained by dividing an 
infinite space into two infinite regions by a boundary, then taking the region 
on one side of this boundary. The only case of a semi-infinite medium that 
we consider is where the boundary is a flat surface (which we take to be the 
surface x = 0 in Cartesian coordinates), and we shall consider the solution 
of the diffusion equation in the half-space x > 0. The semi-infinite medium 


116 Chapter 4 Solutions of the diffusion equation 


can be used to model phenomena close to the surface of a large medium, 
such as temperature variations in the soil close to the surface of the Earth. 


4.2 Diffusion in an infinite medium 


An infinite medium is a region of infinite extent, without boundaries, where 
a particular equation is satisfied. We have already encountered a solution of 
the diffusion equation in an infinite medium in one dimension, in Chapter 3 
(Section 3.7). More generally, if all of the concentration is initially at zo at 
time t = 0, the solution of the diffusion equation for t > 0 is 


al 


a2,0= (4.2) 


N 

VArDt a | ADt 
where JN is the total number of particles. This solution is verified by checking 
that it solves the differential equation, and has the correct form at t = 0: 
as t — 0 from above, the concentration approaches zero everywhere except 
at x = xz. Examples of this solution (at different times and with xo = 0) 
were plotted in the previous chapter, as Figure 3.10. The solution should 
be considered undefined for t < 0. This solution could have been achieved 
by inspired guesswork, but it is instructive to see how it can be obtained by 
a systematic approach. This discussion will naturally lead us to a general 
solution of the diffusion equation in an infinite medium. 


The function c(xz,t) may be expressed in terms of its Fourier transform with 
respect to Z, 1.e. 


1 si is 
efi) = Wee ; dk c(k,t) exp(ikx). (4.3) 


This corresponds to the definition of the Fourier transform given as equa- 
tion (3.24) in Block I, Chapter 3, with both functions depending on an 
additional variable, t. Now we make use of ideas developed in Section 3.6 
of that chapter. If equation (4.3) is substituted into the one-dimensional 
form of the diffusion equation (4.1), and the derivatives are taken inside the 
integral sign, we conclude that ¢(k,t) satisfies 

OC 


a; (Fst) = — I es). (4.4) 


Exercise 4. | 


Show that if c(k, t) satisfies equation (4.4), then c(x, t) satisfies the one-dimensional 
diffusion equation. [Hint: Note that 07/0x? acts only upon the term exp(ikx), and 
that = exp(ikx) = —k? exp(ikz).] 


The essential point is that the diffusion equation becomes easier to solve af- 
ter taking the Fourier transform. In the Fourier-transformed representation, 
derivatives of c(x,t) with respect to x are transformed into multiplication 
of c(k,t) by powers of the Fourier conjugate variable, k. The Fourier trans- 
form of the concentration then satisfies an ordinary differential equation in 
t rather than a partial differential equation in x and t. (Equation (4.4) can 
be regarded as an ordinary differential equation because the only derivative 
present is that with respect to t, and k can be treated as if it were a con- 
stant.) This transformation is useful because ordinary differential equations 
are usually much easier to solve than partial differential equations. 


4.2 Diffusion in an infinite medium 117 


Exercise 4.2 


Solve the equation 


df 
aoe 


and use the solution to solve equation (4.4). Express your answer in terms of ¢(k, 0), 
the Fourier transform of the initial concentration. 


In Exercise 4.2, you should have obtained 
é(k,t) = exp(—Dk*t) é(k, 0) (4,5) 


where the function ¢(k,0) is the Fourier transform of the initial concentra- 
tion, c(z,0) (we assume that this Fourier transform exists). This approach 
solves the problem of determining c(x,t), provided that we can perform the 
two integrals required for the Fourier transform of c(z,0) and the inverse 
Fourier transform of ¢c(k,t). However, there is a more direct and more in- 
tuitive route to the solution, which we discover by applying the convolution 
theorem to equation (4.5). 


According to equation (4.5), ¢(k,t) may be expressed as a product of two 
known functions of k, namely exp(—Dk?t) and é(k,0). Recall the convolu- 
tion theorem, considered in Block I, Chapter 3, Section 3.5, which gives an 
expression for the Fourier transform of a function h = f ®g which is the 
convolution of two functions f(x) and g(x). This states that the Fourier 
transform of h(x) is proportional to the product of the Fourier transforms 
of f and g: h(k) = V2nrf(k)g(k). We write equation (4.5) in the form 


é(k,t) = V2nK(k, t) ek, 0), (4.6) 


and apply the converse of the convolution theorem. We see that the concen- 
tration c(z,t) must be the convolution of the initial concentration, c(x, 0), 
and another function AK (x,t) which will be called the propagator: 


Oe me / dxo K(x — xo, t) c(xo,0) . (4.7) 
—oo 

Comparison between equations (4.5) and (4.6) shows that the propagator 
K(z,t) is the inverse Fourier transform of a Gaussian function: K(k,t) = 
exp(—Dk?t) /./2m (we consider only t > 0 throughout). Calculating the in- 
verse Fourier transform, we find that K(z,t) is also a Gaussian function: 


1 CO 
mae) = | dk exp(ikx) exp(—Dk*t) 


mas, 


1 
VAnr Dt 


(This Fourier transform pair can be deduced from Exercise 3.16 of Block 
I, Chapter 3, which shows that the Fourier transform of exp(—x*/4Dt) is 
V2Dtexp(—Dtk?). This implies that the Fourier transform of K(z,t), as 
given by equation (4.8), is K(k,t) = exp(—Dtk?)/\/27, as required.) We 
also see that the function K(«# — x29,t) corresponds to the special solution 
considered at the start of this section, equation (4.2), divided by the total 
number of particles, NV. The propagator, K(x — xo,t), therefore has a simple 
interpretation: it is the probability density function that a particular particle 
reaches position x at time t, given that it started at position x9 at time t = 0. 


exp(—x*/4Dt). (4.8) 


118 Chapter 4 Solutions of the diffusion equation 


The propagator is a very useful concept for treating linear partial differential 
equations which arise in applied mathematics and theoretical physics. It de- 
termines how a ‘field’ (in this case, the concentration) at one point in space 
and time, (Zo, to), influences (‘propagates’ into) the solution at another point, 
(x,t). Knowledge of the propagator enables solutions to be found for any initial 
condition of the field. In the more general case where there are boundaries or 
where the coefficients of the linear partial differential equation depend upon 
position, the propagator is not a function of the difference between the final 
and initial positions, but depends upon x and 29 separately. Similarly, when 
the boundaries move, the propagator can depend upon the initial and final 
times separately. In the case of a general linear partial differential equation in 
one dimension, equation (4.7) would be written in the form 


A¢,1) = iz dtp K(x, Xo, t, to)c(x0, to). (4.9) 


=O 
Later we shall consider cases where the propagator depends upon x and xo 


separately, but in all of the examples considered in this course, the propagator 
depends only upon the time difference, t — to. (In the above, to = 0.) 


Example 4.| 


The partial differential equation 


de _ de 
Ot Ox? 
where D and R are constants, can be used to model the concentration of 
particles which disappear (for example, due to radioactive disintegration). 
It can also model heat flow along a rod, in cases where heat is lost into the 


surroundings. The constant R is called a rate constant. 


Fe, 


Determine the propagator for this equation (taking the initial time, to, equal 
to zero). 


Solution 


Fourier transformation leads to an ordinary differential equation for ¢(k, t): 
oc >. 
Ot 
which has the solution 
é(k, t) = exp[—(Dk* + R)é| c(k,0), 


so K(k,t) = exp[—(Dk? + R)t]/\/2n. This differs from the previous case 
only by inclusion of the factor exp(—Rt), which is independent of k. Calcu- 
lation of the inverse Fourier transform proceeds as before, and we obtain 
exp(— Rt) exp(—a?/4Dt) 
V4rDt 


(—Dk* — R)é, 


Risa = * 


The following example and exercises provide practice in obtaining the con- 
centration from a convolution integral. | 


4.2 Diffusion in an infinite medium 


Example 4.2 


The error function erf(x) was defined in Chapter 1 (Subsection 1.3.1) by the 
integral 


erfla) = an pealee.). 


Show that the concentration which results from an initial distribution at 
t = 0 which is co for x > a, and 0 for x < a, is (for t > 0) 


c(a,t) = hoo + erf (=)I. 


Solution 


Use equations (4.7) and (4.8): 


C2) = vib | dxo exp[—(x — 29)? /4Dt] c(xo, 0) 


dxy exp|—(x — x0 2 /A Dt), 

= Fem | veel (e — 20)" 408 

where the properties of the initial distribution, c(zo,0), were used in the 
second equality. Using the change of variable u = (x — x%9)/V4Dt, this in- 
tegral is expressed in terms of an error function, plus a standard Gaussian 
integral: 


Co es 
act) = Vii | 
(2, t) V4rDt J(2—a)/V4Dt 
(x—a)/V4Dt 
= = F du exp(—u7) 
Co 0 Co ees: ADt 


=e du art baae : 


du exp(—u’) 


du exp(—u”) 


Co 4 Co ( eh : 
= —+—er 

a V4Dt 
Note that the first integral in the penultimate line is equal to one half of 
the standard Gaussian integral discussed in Block 0, Chapter 1, because its 
integrand is an even function. Figure 4.1 shows the concentration for various 
times for this initial condition, in the case where D = 1, cp) = 1 anda = 0. 
We see that as time increases, diffusion causes particles to move further into 
the initially empty regionz <0. 


=3 = 10 = 0 5 10 i 


Figure 4.1 Concentration at successive times when the initial concentration is 
Co = 1 for x > 0 and zero for x < 0. The diffusion constant is D = 1. 


119 


120 Chapter 4 Solutions of the diffusion equation 


Exercise 4.3 


Determine c(z,t) if the concentration at time t = 0 is co for -L <2 < L, and zero Harder exercise 
elsewhere. 


A good way to tackle this problem is to use the result of Example 4.2, together 
with the fact that the diffusion equation is linear. Proceed as follows. 


(a) Introduce a function H(x) which is zero when x < 0 and unity when x > 0. This function is called the 
Express the initial condition of Example 4.2 in terms of this function. Heaviside function or step 
function; it is useful in 
many areas of applied 
mathematics. 


(b) Now express the initial condition of this problem as a sum of two terms of 
the form b H(a — a), with different values of a and b. 


(c) Use linearity to write the solution at time t. 
Figure 4.2 shows the concentration for various times for the initial condition of 


Exercise 4.3. The diffusion causes particles to spread away from the region in 
which they are initially concentrated. 


Ant) TZ 


Figure 4.2 Concentration at successive times when the initial distribution is a ‘top 
hat’ function. In this example the initial concentration is unity on the interval 
|—2.5, 2.5], and the diffusion constant is D = 1. 


Exercise 4.4 


Determine the concentration at time t if the initial concentration at time t = 0 is 
co(x) = Ap + Ai cos(ka). (Assume Ap > A; > 0, so that the concentration is never 
negative. ) 


[Hint: You may assume the integral identity 


|i dx exp(—az*) cos(Bxr) = \|Zexw(-e*/4a, 


and recall that cos(A — B) = cos Acos B + sin Asin B|| 


Note that the amplitude of the oscillations in the concentration decrease exponen- 
tially with time, decreasing by a factor e when the time increases by tT = 1/(Dk?). 


In both of the plots above (Figures 4.1 and 4.2), we set D = 1. If we want 
to understand the form of the solutions for other values of D, these plots 
are still informative. Note that the diffusion constant and time always occur 
in the solutions as the product Dt, so increasing D has the same effect as 
decreasing t by the same factor. For example, if the diffusion constant were 
D = 10, the same curves would be obtained in Figures 4.1 and 4.2, but with 
the time labels changed to t = 0.01, t = 0.1 and t = 1. 


4.3 Solution in a finite medium in one dimension 121 


4.3 Solution in a finite medium in 
one dimension 


Many problems involving the diffusion equation concern situations in which 
diffusion occurs in a finite region. A good example is the flow of heat along a 
metal rod, for which the temperature satisfies the diffusion equation. Let us 
consider a rod with length L and a uniform cross-section, with the thickness 
of the rod being small compared to L. The coordinate x measures distance 
along the rod, with x = 0 and x = L being the ends of the rod (see Figure 
4.3). The distribution of temperature 6 in a homogeneous system has already 
been shown to satisfy the three-dimensional diffusion equation. If the rod is 
thermally insulated along its length so that no heat enters or leaves it (except 
at its ends), the temperature rapidly becomes very nearly uniform across the 
cross-section of the rod, so that the temperature distribution at time t is a 
function of just one coordinate, the distance x along the rod. It takes a much 
longer time for the entire rod to reach the same temperature; during this 
time the temperature satisfies a one-dimensional diffusion equation (because 
the derivatives of 0 with respect to y and z are negligible). 


thermal insulation metal bar, temperature 0(x,/) 


Figure 4.3 The temperature distribution (x,t) of a thermally insulated metal bar 
obeys the one-dimensional diffusion equation 


Heat loss can be reduced by surrounding the bar with a material with very 
low thermal conductivity, such as polystyrene foam. In practice, the one- 
dimensional diffusion equation is also a fair model for a heavy rod (such as 
a poker), made from a metal such as steel which is a good conductor of heat, 
suspended in air, as illustrated in Figure 4.4. This rod would be described as 
being thermally isolated, because the air has little capacity to absorb and carry 
away heat, rather than thermally insulated. 


temperature 0(x,f) 


Figure 4.4 A metal rod is heated non-uniformly, and we follow how the 
temperature at different positions changes after the heat source is removed. The 
temperature in a long, thin, thermally isolated rod rapidly becomes uniform 
across its cross-section. The temperature distribution 6(2,t) then obeys the 
one-dimensional diffusion equation. 


This problem of determining how the temperature of a solid body varies with 
position and time has an important place in the history of mathematics. Fourier 
series were originally invented to solve just this problem, the flow of heat in 
a finite system. Fourier’s work was rather more general than our approach, 


122 Chapter 4 Solutions of the diffusion equation 


because he also treated loss of heat from the rod to its surroundings, using the 
partial differential equation introduced in Example 4.1. 


Now we consider the method for solving the diffusion equation in this finite 
system. The temperature at position x along a thermally insulated bar is 
initially O9(x) at time t = 0. The ends of the bar are assumed to be at 
x =O and at x=L. The aim is to calculate the time dependence of the 
temperature for each position along the rod, described by a function of two 
variables, 0(2, t). | 


We start by describing all of the information we have about the problem, as 
illustrated in Figure 4.4, in mathematical terms. We shall assume that the 
initial temperature of the rod at time t = 0 is known, and that it is given 
by a function 69(a). We know that the temperature distribution satisfies 
the one-dimensional diffusion equation. Also, because the bar is thermally 
isolated, there is no heat flow at the ends of the bar, at x = 0 and x = L. 
Because the heat flow is proportional to the temperature gradient, 00/0z is 
equal to zero at those points. The temperature distribution 6(,t) therefore 


satisfies 
26 
= —- — (diffusion equation, valid for t > 0), 
6 6 | 
(0. iy = sal t)=0 (boundary conditions, valid for t > 0), (4.10) 
i ay 


O(x,0) = O9(a) (initial condition at t = 0). 


The statement of the problem in equations (4.10) thus involves three ele- 
ments: these are the partial differential equation, the boundary conditions, 
and the initial conditions. 


Partial differential equations are difficult to solve, and it is a good idea to 
attempt to simplify the problem by reducing it to ordinary differential equa- 
tions. We use the method of separation of variables. This was introduced 
in Block I, Chapter 2 as a method for solving the wave equation, but you 
will see that the same procedure works for the diffusion equation. 


4.3.1 Separation of variables 


We write @(x,t) as a product 
O(x,t) = f(x) g(t), (4.11) 


where f and g are two functions which are to be determined. Equation 
(4.11) is substituted into the diffusion equation in order to discover what 
must be required of the unknown functions f and g. The resulting equation 
can be arranged in the form 


g(t) = J “(2) ( A 12) Primes denote derivatives: 
g(t) A j I' (w) =e are. 

Note that the left-hand side of equation (4.12) is a function of t only, but 

the right-hand side is a function of x only. These statements imply that 

both sides of the equation are constant, independent of both x and t. This 

constant is called the separation constant. We shall initially assume that the 

separation constant must be negative or zero, and it is convenient to write 

it as —Dk?, where k is a real number to be determined. The functions f 

and g therefore satisfy the independent ordinary differential equations 


f’ +k f =0 (4.13) 
and 


g' + Dk*g =0. (4.14) 


4.3 Solution in a finite medium in one dimension 123 


For k # 0, these equations have solutions f(z) = Acos(ka) + Bsin(ka) (where 
A and B are constants) and g(t) = exp(—Dk?t). In the special case where 
k = 0, we have f(z) = A+ Ba, and g = 1. Note that because f(x) and g(t) 
occur as a product in solution (4.11) for 0(,t), the arbitrary constant of in- 
tegration coming from equation (4.14) for g(t) is absorbed into the constants 
A and B appearing in the solution for f(z). 


4.3.2 Identifying eigenfunctions 


Having identified the functions f(x) and g(t), we now consider the bound- 
ary conditions. The solution 6(2,t) = f(a)g(t) should satisfy 06/Ox = 0 at 
x = 0 and at «= L. This implies that the derivative f’(x) must equal zero 
at these points: that is, the function f(x) should satisfy the boundary condi- 
tions f’(0) = f’(L) = 0. The functions f(x) satisfying both the differential 
equation (4.13) and the boundary conditions are called eigenfunctions, and 
the corresponding values of k? are the eigenvalues of equation (4.13). 


The condition f’(0) = 0 can be satisfied only if B = 0, so f(x) = Acos(kz). 
The condition f’(L) = 0 is then satisfied by choosing k such that kL = nz, 
with n = 0,1, 2,3,.... The eigenfunctions are therefore f,(2) = Acos(nmx/L), 
and the eigenvalues are k? = (n7/L)?. Note that this solution is valid when 
n =0,so that k = 0 and f(x) = A, corresponding to the special case (k = 0) 
mentioned above. 


The method of separation of variables thus leads to a set of solutions of the 
form 

6(x,t) = An cos(k,x) exp(—Dk2t), kn = = ni iO, odlnfAB) 
where each A, is an undetermined constant, and we have used the expression 
for g(t) quoted below equation (4.14). 


We should briefly consider whether other acceptable solutions could be ob- 
tained by taking a positive separation constant, +Dk?, say. This choice 
would lead to a solution of equation (4.14) of the form exp(Dk?t), which 
is exponentially increasing, contrary to expectations from everyday expe- 
rience (and the laws of physics), which suggest that temperature differ- 
ences between different parts of the rod should decrease. A more com- 
pelling reason for rejecting the positive separation constant comes from 
considering the form of the spatial dependence, which becomes f(x) = 
acosh(ka) + bsinh(kx). The boundary condition f’(0) = f’(L) = 0 cannot 
be satisfied for any value of k other than k = 0 (apart from in the trivial case 
where a = b= 0). The case k = 0 corresponds to f(x) = constant, which is 
the n = 0 case of the set of solutions given in equations (4.15). 


4.3.3 General solution 


Because the diffusion equation is a linear equation, a general solution is 
obtained as a linear combination of the solutions (4.15). A general solution 
of the diffusion equation satisfying the boundary conditions for t > 0 is 


tes TNX rn? Dt 
24) = ) An cos {| —— } exp (- r (4.16) 
— ( F ; 5 


Note that the exponential factors exp(—Dk?t) result in all of the tempera- 
ture differences decréasing over time, with the components with more rapid 
spatial variation (those with larger values of k,,) decreasing more rapidly. 
The same exponential factor was seen earlier, in Exercise 4.4. 


124 Chapter 4 Solutions of the diffusion equation 


Exercise 4.5 


Check that equation (4.16) does satisfy both the diffusion equation and the bound- 
ary conditions at both « = 0 and x = L. 


4.3.4 Orthogonality relation 


The coefficients A, in equation (4.16) remain to be determined. This is done 
by choosing them so that the initial condition at t = 0 is satisfied. Setting 
t = O in equation (4.16) gives an expression for 49(x) = 6(x,0) in the form 


= yo An cos (=) | (4.17) 


This can be recognised as a Fourier series, of the form introduced in Block 
I, Chapter 3, equation (3.1); here the period is 2, and all of the b, co- 
efficients are zero. Formulae for determining the Fourier coefficients using 
orthogonality relations were discussed in Block I, Chapters 2 and 3. For this 
Fourier series, we have the orthogonality relation 


2-dno [v MTE\  . 
fae | dx cos Ga Gs ts (4.18) 


(The multiplier is 2/L when n 4 0 and 1/Z when n = 0; the Kronecker delta 
symbol is used here so that the case n = 0 can be covered without writing a 
separate equation.) This can be checked by calculating the integral directly, 
but Section 4.4 will discuss a more direct and more fundamental approach 
to demonstrating orthogonality relations. 


4.3.5 Calculation of Fourier coefficients 
We can now calculate the Fourier coefficients in equation (4.17). We multiply 


both sides by fm(x) = cos(mma/L), integrate, and use the orthogonality 
relation (4.18): 


[ dx 0o(x =¥ An [ dx cos(nra/L) cos(mra/L) 


LAm 


= An ——— dOnam = ———— . 4.19 
= cS "2 = bmo woe 
So (replacing the index m with n) the Fourier coefficients are 
2—dno f~ 
Ae / dx cos (==) Oo (a). (4.20) 
in | L 


Having determined the coefficients A,,, we have a solution (equation (4.16)) 
which satisfies all of the requirements listed in equations (4.10). The solution 
is in the form of an infinite series. We shall not consider the convergence of 
this series in detail, but we note that for any positive value of t, the factors 
exp(—7?n? Dt/L”) in equation (4.16) make the terms decrease very rapidly 
as n — oo. Therefore, for positive times, the series in equation (4.16) always 
converges nicely. 


4.3 Solution in a finite medium in one dimension 125 


Exercise 4.6 


Obtain the coefficients A, in the case where 69(x) = x on the interval 0 < x < L. 
|Hint: A similar Fourier series was obtained in Block I, Chapter 3, Exercise 3.6.] 


Exercise 4.7 


A steel poker of length L has one end in a fire and the other end in a bow] of water. 
It reaches a steady state, in which 00/0t = 0. Assuming that the temperature 
obeys the diffusion equation, show that the general form for @(x) in this case is 
O(x) = 6+ a(x —$L), where @ and a are constants. Note that @ is the mean value 
of the temperature of the poker. 


[Hint: This is nothing to do with Fourier series: just write down and solve the 
diffusion equation for the steady state, with 06/0t = 0.] 


Exercise 4.8 


The poker in the previous exercise is removed from the ‘source’ and ‘sink’ of heat 
(i.e. the fire and the water, respectively), and is immediately placed in a thermally 
insulating environment. Show that the time-dependence of the temperature from 
the moment the poker becomes thermally isolated is given by 


- <~2lel — Ci nn? Dt NTL 
O(x2,t) =O—- ) er a (- 7 COs (=) 
m=] 


Note that only terms with n odd contribute to the sum. Use the substitution 
n = 2k+1 to re-write the summation in a form with no terms which are identically 
Zero. 


The temperature of the poker in Exercise 4.8 is plotted in Figure 4.5 for 
three different times. 


P= 04)1 


Figure 4.5 ‘The temperature in a thin, thermally isolated metal rod, which initially 
has a uniform temperature gradient, where the thermal diffusion constant is 


D =1, length is L = 1, temperature gradient is a = —1, and average temperature 
aS 2 
2 


The example of a metal rod with an initially uniform temperature gradient 
illustrates a subtle point about the representation of functions by Fourier 
series. In this example, the temperature gradient at the ends of the rod is 
initially a. However, the temperature 09(x) was represented by a Fourier 
series in the form of a sum of cosines, each term of which has a derivative 
which vanishes at x = 0 and at x = L. It therefore appears that the infinite 
sum of the Fourier series can have a property which is not present in any 
of its component terms, or any finite sum of the first N terms, no matter 
how large we make N. Understanding such subtleties of the convergence 


126 Chapter 4 Solutions of the diffusion equation 


of Fourier series was a stimulus to refine pure mathematical analysis in the 
nineteenth century. 


As soon as the rod is made thermally isolated, there is no flux of heat at 
its ends. At any instant after the rod has been disconnected from the heat 
source and heat sink, the temperature gradient at the ends must be zero. 
This effect is clearly visible in Figure 4.5, where the solution after the very 
short time t = 0.01 is very close to the t = 0 initial condition, except at the 
ends of the rod. 


4.4 Orthogonality of eigenfunctions 


You may have encountered the term ‘eigenvalue’ when studying matrices, 
and the term ‘orthogonal’ may be familiar from studying geometry or vec- 
tors. ‘This section explains the connection between the older uses of these 
terms and their use in discussions of solutions of differential equations. 


In Section 4.3 and several places in Block I we have seen that eigenfunctions 
of differential equations have an orthogonality property, which proves very 
convenient when determining the coefficients of the general solution. Until 
now, these orthogonality relations have been discussed on a case-by-case ba- 
sis, and you may have wondered whether there is any general principle which 
implies that the eigenfunctions are always orthogonal. This section shows 
how the orthogonality relation discussed in Section 4.3 can be obtained 
without calculating an integral. The method can be adapted to many other 
problems involving solving ordinary and partial differential equations. In 
fact, it is often possible to show that the sets of eigenfunctions are orthogo- 
nal, even in cases where they cannot be calculated exactly. 


4.4.1 Eigenfunctions and eigenvalues 


Although the term eigenvalue may already be familiar from studying ma- 
trices, in greater generality, eigenvalues are properties of linear operators, 
which include matrices. 


An operator, denoted by A, takes a function f and maps it to another 
function, g = Af. A linear operator A has the property that if f,, fo are 
two functions, and a1, a2 are two constants, then 


A(aifi as a2 f2) = a, Af — agA fo. (4.24) 


An example of a linear operator acting on a function f(x) is differentiation: 
Af is the function df/dz. Another example is multiplication by another 
function a(x), so that Af is the function a(x) f(z). 


An eigenfunction of A is a function f which satisfies 
Af =f (4.22) 


for some constant A, which is called an eigenvalue. We say that f is an eigen- 
function of A associated with the eigenvalue A. Often, additional conditions 
(such as boundary conditions) are imposed upon the function f. 


An N x N square matrix A can be regarded as a linear operator, in the 
sense described above. In this case the functions f correspond to vectors. 
A vector x with N elements x7;, 1 = 1,...,.N, can be regarded as a mapping 


4.4 Orthogonality of eigenfunctions 


or function from the ordered set {1,...,.N} to the ordered set {71,...,xy}. 
The eigenvalues \ and eigenvectors x of the matrix are defined by 
Ae == Aer. (4.23) 


This is a special case of the previous equation, so eigenvectors of matrices 
are simply a special case of eigenfunctions of linear operators. 


4.4.2 Orthogonality of functions 


Now let us consider why two functions are said to be orthogonal if the 
integral of their product is equal to zero. Two lines are said to be orthogonal 
if they are at right angles. If these two lines are represented by the vectors 
a and 6 (i.e. vectors aligned along the directions of the lines), the condition 
for the lines to be orthogonal is that the scalar product of the two vectors 
is Zero: 


a-b= Said; — (4.24) 


where the a; and b; are the components of the vectors a, and b respectively. 


Two functions f and g are said to be orthogonal (on the interval [0, L]) if 


- 
/ xt as) s0 (4.25) 


When dealing with linear operators, it is often helpful to think of the func- 
tions they act upon as being vectors in an infinite-dimensional space. The 
analogy with equation (4.24) becomes clear if the integral is approximated 
by a sum. Introduce two vectors f, and g, of dimension N = Int(L/e) (i.e. 
N is the integer part of L/e) with elements 


fr=VefGe), 93 =Vegje), j=1,...,N, (4.26) 

SO - 
feGe=€>_ f(Je) g(ie). | (4.27) 

jail 


Then, from the usual definition of an integral as a limit, we have 


L 
[de f(o) le) =m Fe-9. (4.28) 


so the integral in equation (4.28) may be thought of as a scalar product of 
two infinite-dimensional vectors. It is then natural to define functions to be 
orthogonal if the integral of their product vanishes. 


127 


128 Chapter 4 Solutions of the diffusion equation 


4.4.3 Orthogonality relation 
In Section 4.3 we derived the eigenfunctions f,(2) which arise when finding 


the temperature distribution in a thermally isolated rod. If we normalise 
these eigenfunctions such that 


L 
[faethe =1 (4.29) 


the normalised eigenfunctions are 


2 — dno NTx i 
jy 7 08 Ga oS, foe, kos (4.30) 


These eigenfunctions are orthonormal, and satisfy 


L 
/ ine =a (4.31) 


0 


which is equivalent to equation (4.18). This can be checked by calculat- 
ing the integral directly, but there is an alternative method, which allows 
us to demonstrate that eigenfunctions of certain differential equations are 
orthogonal, even when we cannot write down formulae for the solutions. 
(This approach will be developed further in Section 4.5.) The idea is to 
obtain the orthogonality relation directly from the differential equation and 
boundary conditions satisfied by f(x), without having to evaluate the in- 
tegral. Let f,(x) and fm(x) be solutions of the differential equation, such 
that f’ +k? f, =0, f/(0) = fi (L) =0, and fm satisfies the same relations 
with a different separation constant k7, 4 k2. Now consider an integral Inm 


defined by 


L 
Tum = f de [fn(t)m(z) — Sro(2) fa(2)] (4.32) 
0 
We note that 


[fn fa) — frn()£4(0)] = Fala) A (@) — fm @) Fle), (4.33) 


so we can integrate equation (4.32) to obtain 
i 


Iam = ete Ah) a pee can iacale (4.34) 
The boundary conditions imply that Inm = 0, because f/ (a2) and f/, (x) are 
zero at x =0 and x = L. We can also simplfy the expression for Ip, in 
equation (4.32) by using the differential equation to write f’” = —k?f,, and 
similarly for fi, so equation (4.32) becomes 


i 
(ae ee oe / dz fala) fala) = 0. (4.35) 


We conclude that if k, # km, the solutions f, and f,, are orthogonal for the 
interval on which the differential equation is considered. This approach will 
work in many other problems involving linear partial differential equations, 
enabling us to show that eigenfunctions are orthogonal, without doing a 
detailed calculation. The case where n = m in equation (4.31) must be 
checked separately by a direct calculation of the integral. 


4.5 Diffusion in a finite medium in two and three dimensions 


4.5 Diffusion in a finite medium in 
two and three dimensions 


Next we consider how to treat diffusion of heat or particles in a finite 
medium in two and three dimensions: the same methods are applicable 
in both cases, but we must interpret symbols representing vectors as two- 
dimensional (r = (x, y)) or three-dimensional (r = (x, y, z)) objects accord- 
ing to the context. We discuss the problem in terms of the flow of heat in a 
thermally isolated system, and solve the diffusion equation for the temper- 
ature 6(r,t). Exactly the same approach is applicable to describing the dif- 
fusion of particles using a concentration c(r,t). We shall follow the method 
used in the one-dimensional case as closely as possible. You will see that 
this leads to an interesting generalisation of the notion of Fourier series. 


4.5.1 The Helmholtz equation 


Consider the problem of determining the temperature distribution inside a 
thermally isolated piece of material in three dimensions. The initial tem- 
perature distribution is assumed to be known. The boundary-value problem 
now takes the form 


differential equation: we = DV76, t>0; 


boundary condition: V6-n =0, t >0 andr on the boundary; > (4.36) 
initial condition: AP = Or}. 


These equations are a natural extension of the one-dimensional case, equa- 
tions (4.10). The first line specifies that @(r,t) obeys the diffusion equation 
for all t > O and all positions inside the volume V. 


The second line of equations (4.36) specifies the boundary condition that the 
gradient of the solution in the direction of the normal to the boundary is zero. 
(Here n is the unit vector normal to the boundary.) Unless otherwise stated, 
we assume that the boundary of the region is smooth, so the outward normal 
n is well defined. This condition holds for all positions on the boundary, 
and for all positive times. This boundary condition is a consequence of the 
system being thermally isolated (that is, not transferring heat to or from 
its surroundings). If no heat flows to or from the surroundings, the flux of 
heat across every element of the surface is zero, and Fourier’s law (Chapter 
3, Section 3.6) implies that the gradient of the temperature in the direction 
normal to the surface is zero everywhere on the surface. Sometimes the 
notation 00/On is used for this ‘normal gradient’. 


The third line of equations (4.36) specifies the initial condition that 6 = 69(r) 
at time t = 0. (The initial distribution, 69(r), is arbitrary, and the boundary 
condition need not be satisfied by the initial condition in cases where heat 
has been flowing into a region which is suddenly made thermally isolated at 


time t = 0. A one-dimensional version of this situation was exemplified in 
Exercise 4.8.) 


The solution follows the one-dimensional case in that it uses the method of 
separation of variables, but there are some differences. The first step is to 
separate variables, writing a trial solution in the form 


A(r,t) = g(t) f(r). (4.37) 


129 


130 Chapter 4 Solutions of the diffusion equation 


Substitution into the diffusion equation, and repeating the argument for sep- 
aration into two independent equations, shows that f satisfies an equation 
analogous to (4.13), of the form 


Vf +Af =, (4.38) 


where J is areal constant. This equation is known as the Helmholtz equation. 
The function g(t) satisfies the equation g’ + D\g = 0, as before, except 
that we have changed the name of the separation constant from k? to 4. 
The Helmholtz equation also arises in solving the three-dimensional wave 
equation, after separating out the time-dependent part of the solution. 


Exercise 4.9 


Derive equation (4.38) and the corresponding equation for g(t). 


Hermann von Helmholtz (1821-94) obtained his equation by applying separa- 
tion of variables to the wave equation. Helmholtz made lasting contributions 
to many areas of science, including the physiology of the eye, ear, larynx and 
nervous system, thermodynamics, fluid dynamics, acoustics, and electromag- 
netism. 


We ensure that @(r,t) satisfies the boundary condition by requiring that 
solutions of the Helmholtz equation satisfy the boundary condition. That 
is, we require that the gradient of the function f in the direction normal 
to the boundary is zero. Mathematically, this is expressed by stating that 
V f -n =0 for every point on the boundary, where fi is a unit vector normal 
to the boundary at that point. This is called a Neumann boundary condition. 
It is a fact that solutions of the Helmholtz equation satisfying this boundary 
condition can be found only for a discrete set of non-negative values of 
X. (This is hard to prove, but will be illustrated by an example later. ) 
These values will be denoted by an integer subscript, labelling the \,, as an 
ascending sequence, which starts from n = 0. There are an infinite number 
of these values A,, for which solutions of the Helmholtz equation satisfying 
the appropriate boundary condition can be found. These values, An, are 
known as the eigenvalues of the Helmholtz equation, and the corresponding 
functions f,(7) are called the eigenfunctions. The set of eigenvalues is called 
the spectrum. 


-_-__—— eee 


Exercise 4.10 


In the case of the one-dimensional example treated in Section 4.3, which equation 
corresponds to the Helmholtz equation? What are the eigenfunctions and eigenval- 
ues? 


_—_———————————————————————————eeeeEeeeeeeeeeee 


4.5 Diffusion in a finite medium in two and three dimensions 


4.5.2 A solution of the Helmholtz equation in two 
dimensions 


There is no general technique for finding exact solutions of the Helmholtz 
equation in two or three dimensions which will always be successful. Solu- 
tions of the Helmholtz equation have been obtained by separation of vari- 
ables in a limited number of cases. The approach works for systems with 
boundaries in the form of simple geometrical shapes (rectangular boundaries 
and those with circular or spherical symmetry), which are often the cases 
of most interest in applications. For a system with an irregularly shaped 
boundary, you would have to use a computer program to find numerical 
approximations to solutions of the Helmholtz equation. 


In two dimensions, the limitations on finding exact solutions of the Helmholtz 
equation have been studied in detail. Exact solutions are available for the 
rectangle, the circle, the ellipse, and three triangles (equilateral, 45-90-45 and 
60-30-90). Even for a shape as simple as a triangle with other choices of 
angles, it is known that no exact expression for the eigenvalues of the Helmholtz 
equation can exist. 


We now consider the solution of the Helmholtz equation by separation of 
variables for a rectangular domain in two dimensions, with the boundary 
condition V f-n = 0, where the sides of the rectangle have lengths a and 
b. A very similar problem in Block I discussed the solution of the two- 
dimensional Helmholtz equation arising from separation of variables for the 
wave equation. 


We need to determine functions f(x,y) which satisfy the Helmholtz equation 
ee 6 (4.39) 
Y 


in the region 0 < x <a,0<y <b, with the boundary condition n-Vf = 
0. These eigenfunctions exist only for certain choices of A, which are the 
eigenvalues of the problem. 


Consider how to describe the boundary condition in a specific form. The 
boundary consists of four segments and four corners (illustrated in Figure 
4.6), and the boundary condition takes a different form on each boundary 
component. The four segments (excluding the corners) are as follows. 


Segment | 


The line segment x = 0, 0 < y < b. The normal may be taken to be n = —i. 
(The sign is arbitrary here, but we quote the choice of sign which makes n 
point outside the rectangle.) The gradient vector Vf is expressed in terms 
of partial derivatives of f, so the boundary condition V f-n = 0 becomes 


Sh ere 
via=—(Sri+ 245) Se (4.40) 


This condition must be satisfied at all points along the line segment. The 
boundary condition on this segment is therefore 


(0,4) =), 0< 9 < 6b. (4.41) 
Segment 2 


The line segment y = 0, 0 < x < a. On this segment n = —j, and the bound- 
ary condition is Of /Oy = 0. We therefore require 
Of 


a =. Daw =< e (4.42) 


131 


132 Chapter 4 Solutions of the diffusion equation 


Segment 3 


The line segment x = a, 0 < y < b. Here n =i and the boundary condition 
is the same as for segment 1, except that x = a: 
Of 


5 (a,y)=0, O<y<b. (4.43) 


Segment 4 


The line segment y = b, 0 < x < a. Here n = j and the boundary condition 
is 


ale j= Dae cme (4.44) 


Corners 


Corners pose a problem because n is ill-defined on these. However, if Vf 
vanishes on these corners, then the appropriate (zero heat flux) boundary 
condition is satisfied. Consider two points very close to a corner, situated 
on either side of it, so that the normals are in two perpendicular directions. 
At each of these points the normal gradient vanishes, in two perpendicular 
directions. We make both points approach the corner, so that there is a 
point at which Of /Ox = 0, which is arbitrarily close to a point at which 
Of /Oy = 0. If we assume that the function f(x,y) is twice differentiable, 
both components of V f must approach zero (as required) as we approach the 
corner. You can check that the solutions we obtain do satisfy this criterion. 


segment 4 


segment | 


) segment 2 a % 


Figure 4.6 We consider the solution of the Helmholtz equation (V? + A)f =0 on 
the interior of the rectangle 0 < « < a,0< y < b, satisfying the Neumann 
boundary condition Vf -n = 0. The boundary condition is considered separately 
for the different segments of the boundary. 


We need solutions which satisfy the boundary condition everywhere on the 
boundary, implying that all four of these conditions must be satisfied. We 
guess that the method of separation of variables might work, and write 


f(x,y) = X(x) Y(y). (4.45) 
Substituting this expression into the Helmholtz equation gives X”Y + XY” + 
AXY =0. Dividing by XY gives 

i oes SS 

Midge? ¥2 dy 
The left- and right-hand sides of this relation are independent of y and z, 
respectively, and are therefore equal to a constant. Based on our expe- 
rience with one-dimensional problems, we anticipate that negative or zero 


values of the new separation constant will produce sinusoidal solutions X (x) 
which can satisfy the boundary conditions. If you wish, you can check that 


(4.46) 


4.5 Diffusion in a finite medium in two and three dimensions 


the solutions found for a positive separation constant cannot satisfy the 
boundary conditions. We also expect that Y(y) should have sinusoidal solu- 
tions. We write the separation constant for equation (4.46) as —k?, and set 
ky = \/A — k2 (assuming that \— kZ > 0), so A = k +k. We then have to 
solve two equations: 


A 4G et,  ¥ eey 26. (4.47) 
These equations have sinusoidal solutions, X (x) = X,cos(kzx) + X2 sin(k,x) 


and Y(y) = Yi cos(k,y) + Yosin(kyy), where X1, X2, Yi, Yo are constants. 
In terms of the functions X(x) and Y(y), the boundary conditions are 


scoment 1: X (0) Yq =O, Oy <b: 
segment 2: X(x) Y’'(0)=0, 0< 2 <a; 
segment 3: X’(a) Y(y)=—0,0<y<}; 
segment 4: X(z) Y'(b) =0, OS e244. (4.48) 


These conditions are satisfied by requiring that X and Y satisfy 
x) =X (a) = 0, Y i= oe (4.49) 


The functions X and Y therefore satisfy Neumann boundary conditions, as 
in the one-dimensional case (see page 122): the equation X” + k?2.X =0 is 
solved satisfying X’(0) = X'(a) = 0, and Y"” + kcY = 0 is solved satisfying 
Y’(0) = Y’(b) = 0. Comparing with the one-dimensional case, L is replaced 
by a for X (x) and by 6 for Y(y). Thus, solutions for X (x) and Y (y) satisfying 
these boundary conditions are 

A (e) =-cos eS , hep) = ces () (4.50) 
where n, m are integers labelling the solutions. (We take n and m to be non- 
negative, to avoid giving two labels to the same solution.) The constants kz, 
k, are, respectively, na/a and m7/b. Because equations (4.47) are linear, 
these solutions can be multiplied by arbitrary constants. 


Taking products of these functions, the eigenfunctions of the Helmholtz 
equation on the rectangle with Neumann boundary condition are therefore 


lilt = Cae eS cos i) (4.51) 


where each Cy, m is an arbitrary constant. The corresponding eigenvalues of 
the Helmholtz equation for our rectangle are \ = k? + site In terms of the 
two integers n, m labelling the eigenfunctions, these are 


Z 2 
dnm = (=) +(=) ee oe ee ee (4.52) 
a 


In this example it was natural to use two integers n, m to label the eigenval- 
ues A. In cases where the Helmholz equation cannot be solved by separation 
of variables, the eigenvalues cannot be labelled in a natural way by pairs 
of integers. It is always possible to rank the eigenvalues by their size, and 
then label them by a single index, namely the order in this list. To account 
for this more general case, in some expressions below we shall use a single 
summation sign when we sum over eigenvalues. 


Sometimes there are two or more distinct (that is, linearly independent) 
eigenfunctions f(r) which have the same eigenvalue . In cases where there 
are two or more linearly independent eigenfunctions corresponding to a par- 
ticular eigenvalue, that eigenvalue is said to be degenerate. If just two lin- 
early independent eigenfunctions have the same eigenvalue, the eigenvalue 
is said to be doubly degenerate. 


133 


A set of N functions f;(x) 
is linearly independent if 
there are no (non-trivial) 
combinations of constants 
a; such that 


oe a; f(x) = 0 for all #: 


134 Chapter 4 Solutions of the diffusion equation 


Exercise 4.1] | 


Show that there are degeneracies in the spectrum of the Helmholtz equation with 
a Neumann boundary condition on the rectangle in the case where all the sides are 
of equal length (so that the region is square). Are there other cases where there 
are degeneracies? 


Exercise 4.12 


Show that the function f(#) = exp(ikz) is an eigenfunction of the differential oper- 
ators d/dx and d?/dx?. If the function must satisfy the periodic boundary condition 
f(x+a) = f(x) for all x, what are the eigenvalues and the spectrum of each of these 
operators? Are there any degeneracies? 


4.5.3 Orthogonality and generalised Fourier series 


Returning to the solution of the diffusion equation in two or three dimen- 
sions, a general solution satisfying the appropriate boundary condition may 
be constructed from the eigenfunctions and eigenvalues of the Helmholtz 
equation. Consider a linear combination of solutions of the form (4.37). The 
function g(t) satisfies g’ + DAg = 0, which has a solution g(t) = exp(—DAt). 
We therefore have solutions of the form 

CO 


O(r,t) = > ° an fn(r) exp(—DAnt), (4.53) 


i= 


where the sum includes all eigenfunctions f,,(r) and their associated eigen- 
values A,,, including cases where the eigenvalues are degenerate. 


This solution satisfies the boundary conditions that were imposed upon the 
eigenfunctions f,(7) of the Helmholtz equation. The coefficients a, are de- 
termined by the requirement that this solution reproduces 69(r) at time 
t= 0. It is not immediately clear how (or whether) this can be achieved. 
Answering this question requires a discussion of the properties of the eigen- 
functions of the Helmholtz equation, which will lead to a powerful generali- 
sation of the concept of Fourier series. 


In discussing properties of the f,,(r), it will be assumed that the correspond- 
ing eigenvalues A,, are distinct. This requirement is usually satisfied if the 
boundary has no symmetry properties, but it is not satisfied in many cases 
where there is a high degree of symmetry (such as the circular symmetry of 
a cylinder or if the boundary is a square or rectangle with rationally related 
side lengths). The eigenfunctions may be multiplied by any constant and re- 
main solutions of the Helmholtz equation satisfying the Neumann boundary 
condition. We shall assume that the eigenfunctions are real-valued functions, 
and that the multiplying constant is chosen so that 


/ dV [fa(r)? =1, (4.54) 
; 


where V is the region within which the function f,(1r) satisfies the Helmholtz 
equation. Eigenfunctions satisfying equation (4.54) are said to be nor- 
malised. In the following discussion we shall assume that the eigenfunctions 
have been normalised. 


We shall now show that the eigenfunctions also have an orthogonality prop- 
erty: they satisfy 


[ dV fal) f(r) = 0 (4.55) 


Note that here ,,, etc., are 
labelled by a single index 
to account for more general 
situations. 


Note that the 
normalisation condition is 
different from that 
occurring in probability 
theory, because here we 
integrate the square of the 
function. 


4.5 Diffusion in a finite medium in two and three dimensions 


(provided n 4 m). This will be demonstrated using a generalisation of the 
argument given in Subsection 4.4.3. Consider the integral 


Inm = / av (L, «ae Cb (4.56) 
V 
For any doubly differentiable scalar fields f and g, we have 
V-(fVg9) = fV7o + VEEVG. (4.57) 


(This is an example of the result of Exercise 3.12, page 91, where we take 
a= f, F = Vg.) Using this result, we can write Ij, as an integral of the 
divergence of a vector field: 


—_ / i Os, he: (4.58) 
Va 


Using Gauss’s theorem we can write In» as a surface integral over the surface 
S which is the boundary of the volume V: 


a / IST hn Oe (4.59) 
5. 


The surface integral is equal to zero because of the boundary conditions 
which are satisfied by the eigenfunctions f,(r) and f(r): the gradient 
of both of these functions in the direction n normal to the surface is zero 
everywhere on the surface, and because dS is in the same direction as n, we 
have V f,-dS = 0 everywhere on the surface. 


Using the fact that both f,(r) and fm(r) satisfy the Helmholtz equation, 
the original expression for the integral Ij», equation (4.56), can also be 
written 


eee [ dV fal) fin(r): (4.60) 


Comparing equations (4.59) and (4.60) shows that the integral of the product 
ful”) fm(r) over the region is zero when A, # Am. Functions satisfying this 
property are said to be orthogonal. 


Assuming that all of the A, are distinct, and imposing the normalisation 
condition (4.54), the eigenfunctions f,(7) satisfy 


[ dV fault) fal?) = Sam = 


Sets of functions satisfying equation (4.61) are said to be orthonormal, that 
is, both orthogonal (for n 4 m) and normalised (for n = m). Note that this 
equation can be regarded as a generalisation of a result which was crucial in 
the development of Fourier series; see Block I, Chapter 3, equation (3.10). 


1, ae Me, 


0;: 2.46 m, Sad 


In Block I, Chapter 3, the functions f,(r) are complex-valued (as opposed 
to real-valued, one of the assumptions leading to equation (4.61)). In Block 
I, Chapter 3, the normalised functions are f,(x2) = exp(2minx/L)/VL, and 
in order to deal with the complex values taken by f,(x), in equation (4.61) 
fm(x) is replaced by its complex conjugate, f* (a). Nothing more will be said 
about how the above discussion needs to be modified to allow for complex- 
valued functions f,(1), as in this section on higher-dimensional generalisations 
of Fourier series, only real-valued functions f,(7r) will be encountered. 


Now we return to the problem of determining a solution of the heat equation, 
which has been obtained in the form (4.53) with the Fourier coefficients ay, 
as yet undetermined. We shall assume that any function of interest (without 
specifying this set of functions precisely) can be expressed as a linear com- 
bination of the eigenfunctions f,(7r). This property of the eigenfunctions is 


Discussed in Block 0, 
Chapter 2, but see also 
Subsection 3.4.3 of this 
block. 


135 


136 Chapter 4 Solutions of the diffusion equation 


termed completeness. With this assumption, the initial condition is to be 
written as a linear combination of the f,(r): 


Age) = yay fal). (4.62) 
n=) 


(This can be viewed as a generalisation of the Fourier series, as introduced 
in Block I, Chapter 3, equation (3.6), with the functions f,(r) replacing 
the exponential functions exp(2minz/L)/VL.) To determine the Fourier 
coefficients an, we follow a procedure similar to that for the usual Fourier 
series: after relabelling the index n in equation (4.62) by m, we multiply 
equation (4.62) by f(r) and integrate over the volume V: 


[ aver) falr) = Sam f AV for) fale) (4.63) 


Equation (4.61) shows that the only integral in the sum which does not 
vanish is that for which m = n, and this integral is unity. It then follows 
that 


On = / dV 09(r) fr). (4.64) 
~ 


This establishes the values of the coefficients a, in equation (4.62). Sub- 
stitution of these coefficients into equation (4.53) gives the solution of the 
diffusion equation, satisfying all of the conditions of equations (4.36). 


The method described above followed the one-dimensional case quite closely, 
by introducing a broad generalisation of the idea of Fourier series, in which 
functions are expressed as linear combinations of eigenfunctions satisfying 
a differential equation and certain boundary conditions. Solutions based on 
generalised Fourier series are a powerful tool in further developments of the 
theory: they can be used to obtain interesting results even in cases where 
the eigenfunctions and eigenvalues cannot be calculated exactly. 


The eigenfunctions can be shown to form an orthonormal set in situations 
where other boundary conditions are applicable. For example, if an object 
is placed in a stirred liquid with very high thermal conductivity, the temper- 
ature at the surface of the object will be constant (which may be taken to 
be zero). The appropriate boundary condition for the eigenfunctions is then 
fn(r) = 0 for all positions r on the boundary. This is called the Dirichlet 
boundary condition. The eigenfunctions are also orthogonal in this case. 


Exercise 4.13 


Show that the normalised eigenfunctions of the Helmholtz equation still satisfy 
equation (4.61) when the Dirichlet boundary condition (f,(r) = 0 for points r on 
the boundary) applies. Again, assume that there are no degenerate eigenvalues. 


Exercise 4.14 


Determine the normalised eigenfunctions of the rectangle with side lengths a and 
b using the Dirichlet boundary condition. Show that the eigenvalues are still given 
by equation (4.52), except that cases where n = 0 or m = 0 are excluded. 


4.6 Semi-infinite domains: temperature waves 


4.6 Semi-infinite domains: 
temperature waves 


This section will consider a solution of the diffusion equation in a semi- 
infinite domain. (We describe the method of solution in terms of a heat flow 
problem.) Here we do not assume that the initial temperature distribution 
is known, but we do specify either the temperature or the heat flux at the 
boundary. ‘The physical context is that of a surface which is alternately 
heated and cooled, with repeat period 7’. An example is the surface of 
the Earth, which is alternately heated and cooled during the day (in this 
case T = 24 hours). The temperature below this surface is expected to 
vary periodically as a function of time, also with period 7’. It is natural to 
expect that the amplitude of the oscillations in the temperature will become 
negligible at great depths below the surface, x, so we impose a boundary 
condition that the temperature approaches a constant value as the depth x 
approaches infinity. It is of interest to determine how the temperature varies 
as a function of depth, and how far the temperature variation lags behind 
the changes at the surface. If the surface is heated or cooled uniformly, 
we expect that the temperature will depend only on the depth x (and not 
on any other spatial coordinate) and time ft, so (although this is a problem 
posed in three-dimensional space) we are concerned with finding a solution 
of the one-dimensional diffusion equation. 


First consider the formulation of the problem in mathematical terms. ‘The 
surface which is heated is the plane x = 0, and the soil is the half-space 
x > 0. We must determine a solution 0(x, t) satisfying the diffusion equation, 
and appropriate boundary conditions. We shall restrict ourselves to looking 
for solutions which vary periodically in time, with period T. (There can 
be non-periodic disturbances which decrease in amplitude as a function of 
time, often called the transients, which we shall not consider.) We apply the 
boundary condition that @(z,t) approaches a constant (0, say) as © — oo. 
It will be assumed that the heat flux density J(t) (that is, the heat energy 
per unit time passing through a unit area perpendicular to the surface) at 
the surface varies sinusoidally: 


J(t) = Josin(ut), (4.65) 


where w = 27/T is the angular frequency corresponding to the period T. 
(When J is negative, the surface is losing heat.) A general periodic variation 
of J(t) is discussed later. The heat flux density is related to the temperature 
gradient at the surface by Fourier’s law (discussed in Section 3.6), 


= = (0,1) (4.66) 


where « is the thermal conductivity. 


We expect that the temperature 6(x,t) varies with the same period as the 
changes at the surface. It is convenient to write the sinusoidal function as 
a sum of two complex exponentials: sin(wt) = |exp(twt) — exp(—iwt)|/2z7. 
We then seek solutions of the one-dimensional diffusion equation 06/0t = 
D 076/dx? in the form 


O(a.4) = exp( iat) fia), (4.67) 


where + indicates that we consider both plus and minus signs, correspond- 
ing to the two exponentials exp(+iwt) which make up the sin(wt) function. 


137 


138 Chapter 4 Solutions of the diffusion equation 


Substitution of the trial solution (4.67) into the diffusion equation gives an 
ordinary differential equation for f(x): 


d? fs 
dx? ' 
where + indicates that we take a positive or negative sign to correspond 
with the the sign in exp(+iut). 


+iwf, =D (4.68) 


Of course, our final solution to the problem must be a real-valued function, 
because the temperature measured by a thermometer is not a complex num- 
ber. This will be achieved by combining two complex solutions to make a real 
solution. 


The motivation for using complex functions is that they make it easier to solve 
the problem, both by making it easier to spot a solution, and by simplifying 
the algebra. If you need to be convinced of this, try to re-work this problem 
without using complex functions. Even checking that our solution (equation 
(4.76) below) satisfies the diffusion equation is hard work without using complex 
exponentials. 


Equation (4.68) has a solution of the form f,(x) = exp(a,2x), where the 
subscript + refers to the fact that the multiplier of w in the left-hand side 
of equation (4.68) is +7, and where a satisfies 


w=Da. (4.69) 


Rearranging this equation to determine a,, we find 


a4 -+/#=4/2a040 = +K(1+i), (4.70) 


where we use the fact that (1 +7)? = 27, so Vi = (1+1)/V2, and where 
K = \/w/2D. Note that there are two possible signs for the square root. The 
solution with the positive sign gives a function f,(x) = exp(K) exp(iK7z), 
which grows as x increases. This solution must be rejected, because we 
assume that the influence of heating the surface becomes negligible at great 
depth. Only the solution with negative sign should be retained, which gives 
f(z) = exp(—Ka) exp(—iK7) as an acceptable solution, and 


a. = -/5 (1 +3) (4.71) 


as the acceptable value for a1. 


Changing the sign in front of w in equation (4.68) results in a solution 
f(z) = exp(a_z), with a_ satisfying 

tig = Dae. (ero) 
Again, the expression for a_ contains a square root, and one choice of sign 


leads to an unphysical solution, with f_(x) increasing as x increases. The 
second (acceptable) solution has 


= - (4 ay: (4.73) 


where (1 — i)* = —2i has been used. The solution of the diffusion equation 
is a linear combination of the two acceptable solutions, after multiplying by 


4.6 Semi-infinite domains: temperature waves 


exp(iwt) or exp(—iwt) as appropriate: 


O(x,t) = A+ exp(iwt) f(x) + A_ exp(—iwt) f_(z) 
= A, exp(iwt + a,x) + A_ exp(—iwt + a_z) 


ae ie z (ss (1 +4 D2 
+ A_exp mist = (ss (is Ds | (4.74) 


The physical value of the temperature must be a real number. The solution 
(4.74) is real for all values of x and t if Ay = (A_)* (since then 0(z,t) = 
[A(x,t)|*, implying that 0(z,t) is real). Writing A; = 5A exp(+i¢) gives a 
real solution of the form 


O(v,t) = Acos (+ wt - 1 r) exp (- 5°) | (4.75) 


using cosu = (exp(tu) + exp(—iu))/2. The cosine factor in this solution 
may be interpreted as a wave travelling in the positive x direction with 
speed V2Dw. (Compare this with the travelling wave solutions considered 
in Block I.) The exponential factor in equation (4.75) indicates that the 
amplitude of this wave decreases with a ‘scale length’ K~! = \/2D/w. 


That the travelling wave has a speed proportional to “Dw can be inter- 
preted through the following (approximate) argument. It is known that the 
(one-dimensional) diffusion equation has solutions corresponding to having the 
mean-squared distance, (x), travelled by a particle over a time period T taking 
the value (x?) = 2DT. (See, for example, Section 3.7 of the previous chapter, 
particularly equation (3.61) on page 98.) Thus, one can argue that the ‘mean 
speed’ of this diffusing particle is \/(x?)/T = ,/2D/T, which is proportional 
to Dw because T = 27/w. Since heat is carried by diffusing particles, one 
might, therefore, expect the temperature wave to travel at a speed propor- 
tional to VDw. Furthermore, note that the ‘length scale’, K~! = ,/2D/w, is 
proportional to /2DT, the root-mean-squared distance, ,/(x?), travelled by a 
diffusing particle over the repeat period JT. Thus, one might anticipate that 
the amplitude of the temperature wave diminishes on length scales larger than 
V2DT since the diffusing particles (carrying heat from the surface) are less 
likely to get there. 


The constants A and @ are determined by requiring that the boundary 
condition described by equations (4.65) and (4.66) is satisfied. The required 
solution is 


oe sy ae («1 c ‘c= + =) ad (-/5°) ot 4 eae) 


Exercise 4.15 


Confirm this result, by showing that the choices ¢ = 57/4 and A = (Jo/K),\/D/w 
make the solution (4.75) consistent with the boundary condition described by equa- 
tions (4.65) and (4.66). 


The solution (4.76) is illustrated in Figure 4.7. The temperature is plotted as 
a function of the depth below the surface, for four different times throughout 
the period of the heat flux density, J(t). For this plot, we have chosen 
D=w= Jo/K = 1. When these constants are assigned different values, the 
form of the curves remains the same (but of course the horizontal and vertical 
scales of the plots must change). Note that the temperature at points below 
the surface lags behind that at the surface, and that the amplitude of the 
temperature wave decreases rapidly with depth. 


139 


140 Chapter 4 Solutions of the diffusion equation 


Figure 4.7. ‘The temperature as a function of depth, for four different times, for the 
case D=w = Jo/K =1. The rate J(t) at which heat is supplied is a maximum at 
t = 7/2, and a minimum at t = 37/2. 


Exercise 4.16 


What is the temperature at depth x if the temperature at the surface is 49 sin(wt)? 
[Hint: The solution is unchanged up to equation (4.75).| 


Example 4.3 


Generalise the calculation of the previous exercise to the case where the 
temperature @9(t) at the surface is an arbitrary periodic function of time, 
with period T = 27/w. (Express 69(t) as a Fourier series, and use the fact 
that the diffusion equation is linear.) 


What happens to the relative magnitudes of the terms in your expression as 
you go further from the surface? 


Solution 


Because the diffusion equation is linear, an arbitrary linear combination of 
its solutions also satisfies the diffusion equation. Any surface temperature 
which is a periodic function of time, with period 7, may be written as a 
Fourier series: 


Gy{t) = - + ~~ [an cos(nwt) + by sin(nwt)] , 


227 


where w = 27/T is the angular frequency corresponding to T. Adapting 
the results of the previous exercise by replacing w with nw, and by allowing 
for a Fourier series containing both cosine and sine terms, we see that each 
component of this Fourier series corresponds to a solution of the diffusion 
equation of the form (4.75). Combining these solutions with appropriate 
values of A and @, so that 6(0,t) = 9(t) is satisfied, we obtain 


Oe, t).= - +} exp (-/ e) an 008 (nat - = r) 
n=1 
NW 
by, Si t—,/— ; 
+ bd, sin ( viv; x) | (4.77) 


The components of the sum with n = 1 decrease least rapidly as x increases. 
Thus, for large values of xz, only this term and the constant term ag/2 
are significant, so the temperature varies approximately sinusoidally with 
time. 


4.6 Semi-infinite domains: temperature waves 


Exercise 4.17 


A surface is alternately exposed to one hot fluid at a temperature +6) for a time 
m/w, then to a cold fluid at temperature —69 for a time +/w, and the cycle is 
repeated over and over again. 


(a) What is the Fourier series for the surface temperature at time ¢? [Hint: 
This was obtained for w = 1 in Block I, Chapter 3: see Example 3.2.] 


(b) Show that the temperature at depth x below the surface at time t is 
2 1 
(2n + 1)wt — ey ie ; 


Ma otal: 
ret) = SEY | i 2D 
n=0 


x exp - ate : (4.78) 


This result is illustrated in Figure 4.8, which shows the time-dependence of the 
temperature at the surface and at three different depths. We can again observe 
that as the depth increases, the amplitude of the temperature wave decreases, 
and that it lags further behind the surface temperature. Also, because the higher 
Fourier components (i.e. terms with larger n) are reduced more rapidly, the time- 
dependence of the temperature wave becomes essentially sinusoidal at large depths. 


O(x,t) ; 


Figure 4.8 ‘lhe temperature as a function of time, at depth xz below a surface 
which is alternately heated to temperature 69 = +1 and cooled to temperature 
—69, with a frequency w = 1. In this illustration, we have set D = 1. 


Exercise 4.18 


In desert areas, temperatures can be very high during the day and very low at night. 
Desert animals can escape from the extreme temperatures by burrowing into the 
ground. 


In a desert area, the daily high and low temperatures are 40° C and 0° C respec- 
tively, and the thermal diffusivity of the dry sandy soil is 0.24 x 10~© m? s7!. 
Assuming that the flux of heat varies sinusoidally throughout the day, estimate the 
depth to which an animal must burrow so that the temperature of its surroundings 
does not exceed 25° C. 


141 


142 Chapter 4 Solutions of the diffusion equation 


4.7 Outcomes 


After reading this chapter, you should: 


e appreciate the concept of a propagator, and be able to determine (with 
some guidance) the propagator for partial differential equations which 
are similar in structure to the diffusion equation 

e be able to use a propagator to determine a temperature, concentration, 
or similar scalar field, by evaluating an integral 

e beable to construct solutions of the diffusion equation and closely related 
partial differential equations using the method of separation of variables, 
in one or more dimensions 

e be able to demonstrate orthogonality relations, and be able to use them 
to calculate Fourier coefficients for particular solutions of diffusion or 
heat flow problems 

e appreciate the existence of temperature waves in periodically heated and 
cooled structures, and be able to calculate temperature waves in a half- 
space with a uniformly heated planar boundary. 


4.8 Further exercises 


The following exercises provide additional practice in solving typical prob- 
lems of the types treated in this chapter. 


Exercise 4.19 
The diffusion-advection equation, describing diffusion in a moving fluid, was intro- 
duced in Chapter 3, Exercise 3.28: in one dimension it takes the form 

Oc O 07¢ 


a = ae) +P aye 


where D is the diffusion constant and v is the velocity of flow of the fluid, which in 
this exercise is assumed to be independent of z. 
(a) Can you guess the form of the propagator K(x — xo,t) for this equation? 


(b) Calculate the propagator using the Fourier transform approach, as described 
in Section 4.2. 


4.8 Further exercises 


Exercise 4.20 


Consider a uniform metal bar of length LZ which can exchange heat with its sur- 
roundings such that the temperature obeys a modified diffusion equation of the 
form 
Oo 076 
— = D— — R(0—-9,), 
Ot je" ( a) 
where D is the thermal diffusivity, R is a constant (sometimes called a rate con- 
stant), and @, is the temperature of the surrounding air (assumed to be constant). 


At time t = 0 the bar is removed from the heat sources, so that there is no heat 
transferred through the ends of the bar, and the boundary conditions are 


an a0 
pe = 0. 


Using the method of separation of variables, find a general solution of the partial 
differential equation given above. Determine 6(z,t) in the form of a Fourier series, 
when the initial temperature is a specified function, 69(z). 


Exercise 4.2] 


If the bar described in the previous exercise is held in contact with heat sources, so 
that both ends of the bar (x = 0 and x = L) are at temperature 6,, determine the 
steady-state temperature 69(x) at position z. 


Exercise 4.22 


A cube, with sides of length L, made from a material with thermal diffusivity D, 
is heated to a uniform initial temperature 09. At time t = 0, it is plunged into a 
stream of fast-flowing liquid, so that the temperature everywhere on the surface is 
equal to zero for all t > 0. Show that the temperature at the centre of the cube is 
(for t > 0) 


act) = 00 (4) > SoS sin (ME) sin (222) sn 
1 


pe tigcS) fix=t 


x 


exp[—(ni + n3 + n3)n?Dt/L?). 
11 N2N3Z 


Hard exercise 


143 


p44 Chapter 4 Solutions of the diffusion equation 


Solutions to Exercises in Chapter 4 


Solution 4.1 


Write the concentration in terms of its Fourier transform with respect to x: 


60,2) dk exp(ikx) c(k, t). 


2 


Assuming that equation (4.4) is satisfied, we have 


CO 0é 
/ dk exp(ikz) Eo + Dk a(t) = 
Using the fact that (0? /0x?) exp(ikr) = —k? exp(ikx), we may replace the factor 
of k? by a second derivative with respect to x, and obtain 


OC a. oe 
= | dk exp(tkax) — re Jee a dk exp(ikx) c(k, t). 
Comparing with the first equation, we see that the left-hand side is Oc(x, t)/Ot, and 
the integral on the right-hand side is /27c(x,t), so c(x,t) satisfies the diffusion 
equation. 


rr eg 


Solution 4.2 


Rearrange the given differential equation to read 


which can be integrated to give Inf = —at+C, where f > 0 and C is a constant 
of integration. Taking the exponential of this relation gives 


f =A exp(—at), 


where A = exp(C). Now apply this to equation (4.4), where ¢ plays the role of the 
dependent variable f, and the constant a is replaced by a function of k, namely a = 
Dk?. Making these substitutions gives ¢(k, t) = Aexp(—Dk?t), where the multiplier 
A may be a function of k. When t = 0, this relation reads c(k,0) = A, which 
determines A, so 


é(k,t) = exp(—Dk?t) é(k, 0). 
Solution 4.3 
(a) The initial condition for Example 4.2 may be written 
c(x,0) = eH (a — a), 
where H(x) is the step function (Heaviside function) defined in the question. 
(b) In this problem, the initial condition is 
c(z,0) = col H (a+ L) — A(x — L)I. 
(c) Using equations ae and pee we have 


Aet)= 


vies | ar exp| ala — 29)" /4Di) Hag +1) 


5 in dao exp[—(x — x0)? /4Dt| H(xo — L). 


The integrals in this expression were obtained in Example 4.2, as part (a) 
above shows: setting a equal to —L and +L successively, and subtracting one 
expression from the other, we have 


3-3 fon) me) 


Solutions to Exercises in Chapter 4 


Solution 4.4 


Again, starting from equations (4.7) and (4.8), and using the changes of variable 
u = (x — 2%0)/V4Dt and v = x — Xo, we have 


it) dao exp|—(x — 29)*/4Dt] [Ap + A1 cos(k2)] 


Tabi | 
A 
ot y+ ates | 


Now use the identity cos(A — B) = cos Acos B + sin Asin B, and evaluate the first 
integral: 


duexp(—u? dv exp(—v? /4Dt) cos[k(x — v)]. 


A, cos(ka) 
VAnrDt 

A, sin(k be 

af dv exp(—v* /4Dt) sin(kv). 


The first of these integrals is of the form quoted in the Hint, with a = 1/4Dt and 
G3 =k. The second integral is zero, because the integrand is an odd function. The 
concentration at time t¢ is then 


c(x,t) = Ap + A; cos(kx) exp(—k? Dt). 


c(x,t) = Ago + / dv exp(—v* /4Dt) cos(kv) 


Solution 4.5 


Consider the function given in equation (4.16). We can check that this satisfies the 
diffusion equation, by evaluating partial derivatives: 


ola mn*DAn a TNX rn? Dt 
SS cos (—~) exp (——Fa— J 
aly 3 mAn, = ene : ( aoe 
= - in {| —— } exp | -—_,—_], 
Ox LE z [2 

n=0 
076 —~ nA, TNX 1n? Dt 
aa? a -a 088 Gas aia aE ane 

n=0 


Comparing the expressions for 06/0t and 076/0x7, it follows that 6(x,t) satisfies 
the diffusion equation. We must now check that it also satisfies the boundary 
conditions, that is, 00/0x = 0 at both x = 0 and x = L, for all values of t. At 
x = L, 00/0x = 0 because sin(n7) = 0 for all integer values of n. At x = 0, the 
boundary condition is satisfied because sin(0) = 0. 


Solution 4.6 


Use equation (4.20) to obtain the Fourier coefficients for n > 0 via integration by 


parts: 
E 
/ dx x COs (=) 
0 L 
£ 


= f= — 1. 


The Fourier coefficient for n = 0 is just the average value of the function, i.e. 


i er L 
Ao== | dx x = —. 
0 y 


L 
The Fourier coefficients are therefore 
A 2 (a) ta ad 
n= —— rn ae =— 
Tn? 2 =< 


145 


146 Chapter 4 Solutions of the diffusion equation 


Solution 4.7 


In the steady-state condition, the temperature is independent of time, so 00/0t = 0, 
and the temperature @ depends only upon x. The diffusion equation then reduces 
to the equation d*6/dx? = 0, implying that 0(x) = ax + 3, where a and £ are 
constants. ‘The mean value of the temperature is 


~ Ye 
C= if dx (x) = saL + 2. 
L Jo 


The temperature at position x may therefore be written 
6(x) = a(x — L/2) +8, 


where a is the temperature gradient, and @ is the mean temperature. 


Solution 4.8 


The instant after the poker is made thermally isolated, the solution of the diffusion 
equation in the form (4.16) is applicable. The Fourier coefficients can be deduced 
from those obtained in the solution to Exercise 4.6. This is because the function 


9o(x) here is obtained from that of Exercise 4.6 by multiplying by a and adding 6 — 
sa. The additive constant 6 — sal does not contribute to the Fourier coefficients 


A, for n > 0. The Fourier coefficients for n > 0 are then those of Exercise 4.6 
_multipled by a, and the Fourier coefficient Ag (which in Exercise 4.6 is L/2) is here 


replaced by the mean value of the temperature, 6. The temperature at position x 
and time t¢ is then 


~ ab — [1 —(—-1)"| TNX rn? Dt 
O(x, t) = — ee ou a (=) e€Xp (- TL? ‘ 
(i2=1 


Only terms corresponding to odd values of n contribute to the sum, and by setting 
n = 2k-+ 1 we can express it as a sum which includes only these terms: 


= Beh oe 1 1)?2?D 
ich ——— co (Sane) exp —— ee ‘), 


m 24 (2k+12 0 L i? 


Solution 4.9 
Write 0(r,t) = f(r)g(t), and require that @ satisfies the diffusion equation: 


oly dg 9 
— =f ~=DV’0=Dg V'*f. 
i ee oN A 
Dividing these equations by fg, we obtain 
2 / 
po 
; g 


The left-hand side is independent of t, and the right-hand side is independent of 
r. Both sides must therefore be independent of both r and t, i.e. they are equal 
to a constant which will be called —AD. We therefore have the following separated 
equations for f and g: 


d 
Vf = —Af, - tere 


Solution 4.10 


In one dimension, the Helmholtz equation is 


ee J 

1 + Af =0. 
This corresponds to equation (4.13) (on setting k? = A). The function f(x) = 
cos(ka), with k? = \, is a solution of this differential equation. When k — nn /D 
(with n = 0,1,2,...), this choice also satisfies the Neumann boundary condition 
f'(0) = f’(L) = 0. The eigenfunctions are f,,(~) = cos(mna/L), and the eigenvalues 
fi Aa =r Ane. 


Solutions to Exercises in Chapter 4 


Solution 4.1 I 


When a = 0, the two eigenfunctions 
frm(@,y) = cos(nmz/a)cos(mmy/a), — fm,n(&, y) = cos(mmax/a) cos(nty/a) 


(with n 4 m) have the same eigenvalue, namely \ = 17(n? + m)/a?, so this eigen- 
value is degenerate. 


Degeneracies occur whenever the lengths a and b are rationally related, i.e. whenever 
a WN 


6 8M 


where N and M are both integers. As an example, consider the case where a = 3b. 
The eigenfunctions 


fo.i(x,y) = cos(672/3b) cos(y/b), f3,2(x, y) = cos(372/3b) cos(27y/b) 


both have eigenvalue 
eee | Om 
os iG) +5) | ~ 
Solution 4.12 


For the function f(x) = exp(ika), we have 


d e 2 
a4 (2) Sseexpliks) =i f(7), da? (x) = —k* f(z), 


so f is an eigenfunction of both of the operators d/dx and d?/dz?. 


Now apply the boundary condition f(a +a) = f(x) to f(x) = exp(ikx). For any 
real number @ and any integer n, we have exp|i(@ + 27n)] = exp(7@) exp(27in) = 
exp(i9), so f(x +a) = expli(kx + ka)] = exp(ikx) = f(x) if ka = 27n. The bound- 
ary condition is therefore satisfied by setting k = 27n/a. The spectrum of the op- 
erator d/dx with this boundary condition is therefore the set of eigenvalues 27in/a, 
where n is any integer (positive, negative or zero). Similarly, the spectrum of d?/dx? 
is the set of eigenvalues —(27n/a)*, with n = 0,+1,+2,.... 


In the case of the operator d?/dx? with the periodic boundary condition, we see 
that all of the eigenvalues except for n = 0 are doubly degenerate, because for 
each eigenvalue —k?, the eigenfunctions exp(ikx) and exp(—ikz) are linearly inde- 
pendent with this same eigenvalue. There are no degeneracies for d/dx since all 
eigenfunctions have distinct eigenvalues 27in/a. 


Solution 4.13 


Let fn(r) be the eigenfunctions of the Helmholtz equation (V? + An) fn = 0, satis- 
fying the Dirichlet boundary condition f,,(r) = 0 for all points r on a closed surface 
S enclosing a volume V. 


Consider the volume integral J,,,, defined by the first expression in the sequence of 
equalities 
a / PAT = $207.) 
V 
= | dV [fnV*fm — fn V? Fa 
e 


ae es — Amn) fav ee 


where the first step uses the product rule of differentiation in the form V-(fVg) = 
f{V7o + Vf-Vg (see equation (4.57)), and the second step uses the fact that f, 
and fm satisfy (V7 + A)f =0. From this expression we conclude that if Inm = 0 
and the eigenvalues A,, and A,, are different, then the eigenfunctions f, and f,, are 
orthogonal. 


147 


148 Chapter 4 Solutions of the diffusion equation 


Now evaluating I, using Gauss’s theorem, we have 


lem —_ / GS-( fa V fea = Fa ¥ Tx) — 0, 
he 


which vanishes because the Dirichlet boundary condition states that f, and fm are 
both zero everywhere on the boundary. Therefore all eigenfunctions with distinct 
eigenvalues are orthogonal and thus, since we assume that there are no degenerate 
eigenvalues, equation (4.61) holds after imposing normalisation on f,(r). 


Solution 4.14 


We use the same approach as in finding the solution for Neumann boundary con- 
ditions given in Subsection 4.5.2. We require a solution of (V7 + A)f =0 with 
f(r) = 0 for all points r on the boundary of the same rectangle. Again, we at- 
tempt a solution by separation of variables, in the form f(x,y) = X(x)Y(y). In- 
serting this trial solution, we again find that X (x) and Y(y) satisfy X” + k2X =0 
and Y” + ot = 0, where the primes denote derivatives, and the constants k, and 
ky satisfy k2 + ks = .. The boundary conditions are 


f(0,y) = flay) =0 ‘esp = 5), 
f(s, =fiewa=0 = 2 <2). 


These conditions are satisfied by finding functions X (2) and Y(y) which satisfy 
A(O) = Aiea =o, 2 = 7b) = 0. 


Consider the function X (x). The differential equation X” + k?.X = 0 is solved by 
X(x) = sin(k,x). This automatically satisfies the boundary condition on X() at 
x =0. The boundary condition X(a) = 0 is satisfied by choosing k, such that 
k,@ =n, where n is an integer. Changing the sign of the integer n changes only 
the sign of the solution, and does not make a distinct (that is, linearly independent) 
eigenfunction. The value n = 0 is excluded because the resulting function X (x) is 
zero everywhere. We therefore consider only the values n = 1,2,3,.... Similarly, 
we find Y(y) = sin(mzy/a), with m = 1,2,3,.... The eigenfunctions are therefore 


ite = ae (=) sin (=) : 


where the constants Cy, are chosen to normalise the eigenfunctions, so that 


i = [7 aw fay dy | f(x, y)| = fe a dx sin?(efa) dy sin*(mzy/b). 
0 


op dx sin? (nax/a) = sg dx |1 — cos(2n7z/a)| 


a Innrx \ | ° a 
— sin ae 
Ant a 0 y 


where the identity sin? 2 = (1 — cos 2x) /2 was used in the second equality. 


Similarly, 


b 
/ sin?(my/b) dy = b/2, 
0 
so 1= Coe x ab/4, and therefore Cy, = 2/Vab. The eigenvalues are 


2 2 
An = RB + hE = (=) + (=), a eee ee 


Solutions to Exercises in Chapter 4 149 


Solution 4.15 
From solution (4.75) we find 


Og WO ie 
5, 00: i= A aD [sin(d + wt) — cos(d + wt)| 


SO 


Ay/ “= lsin(wt)(cos @ + sin d) + cos(wt)(sin d — cos ?)). 


In order to satisfy the boundary condition, we must have (06/0z)(0,t) = —Jo sin(wt)/k. 
We therefore need sin ¢ = cos @ so that 00/0x at x = 0 is purely a multiple of sin wt 
(i.e. the coefficient of coswt is zero). This can be satisfied by setting ¢ = 57/4, so 
sind = cos¢ = —1//2. Requiring that the resulting sinusoidal variation has the 
correct amplitude means that we must set 


_ by [D 


SO 


A 


K W 


Solution 4.16 
Everything up to equation (4.75) remains unchanged. Applying the boundary con- 
dition, 0(0,t) = 09 sin(wt), we have 
6(0,t) = Acos(¢ + ut) 
= Acos@ coswt — Asin dg sin wt 
= Oo sin wt, 
which requires 
Acos¢=0, —Asind = 4, 


which are satisfied for ¢ = —7/2 and A = 49. The required solution is therefore 


Pee ree cae ew So 


Again, the solution is a travelling wave which decreases in amplitude away from 
the boundary. 


Solution 4.17 


(a) Here the function 0(0,t) is a square-wave function: 6(0,t) = 4 sign|sin(wt)]. 
The Fourier series for a square wave was determined in Block I, Chapter 3, 
Example 3.2. Written as a Fourier sine series, using equations (3.17) and 
(3.18) from that chapter to convert to a, = 0 and b, = 2[1 — (—1)”|/an, then 
setting x = wt and multiplying by 40, we obtain 


6(0,t) = 3 a a sin(nwt). 


7 


(b) Each term of this series corresponds to a solution of the form determined in 
Exercise 4.16. Because we are solving a linear partial differential equation, we 
can take a linear combination of these solutions. ‘his linear combination is 
chosen so that it corresponds to the Fourier series in part (a) when x = 0, and 
therefore the required solution is 


yl 


150 Chapter 4 Solutions of the diffusion equation 


In other words, for each term in the series, w in Exercise 4.16 is replaced by 
nw, and 6 in Exercise 4.16 is replaced by 269[1 — (—1)”"]/an. These Fourier 
coefficients vanish for even values of n. Writing n = 2k + 1, we have 


49vn 1, 2k + 1)w 2k + 1)w 
O(x,t) = = yO ak 1 sin | (2k + 1)wt — ae ie : exp - a : 
b=a0) 


2D 2D 
Solution 4.18 


The amplitude of the temperature wave at the surface is 09 = 5 (40° C-—0°C)= 
20° C, and the average temperature is 6 = (40° C + 0° C) = 20°C (also assumed 
to be the temperature reached at great depths below the surface), so the surface 
temperature is given by 


(0, t) = 8 + A cos wt. 


Hence, using Example 4.3 (with ag/2 = 6, a, = 4 and all other Fourier coefficients 
being zero) we have 


O(x,t) =0+ 69 exp (-/3 r) COs (ut r). 


At the depth of the burrow, the amplitude of the temperature wave cannot exceed 
25° C—20° C= 5° C, which is smaller than the amplitude at the surface by the 
retin R= Say ce 7 At a depth x, the amplitude of the temperature wave is 
reduced by the factor exp(—,/w/2Dz). The minimum depth for the burrow, 2min, 
therefore satisfies 
Ww 
R= “Al Zon. imi | 
exp | ap @ | 
Also, the angular frequency is w = 27/(24 x 60 x 60)s~! (since the period is 24 x 


60 x 60, the number of seconds in 24 hours). We find (giving the final result to two 
significant figures) 


6 
— = mk = 2 eo 24 X10" 2 Ee 


2K 
= eS 11cm. 


Solution 4.19 


(a) The effect of the fluid flow is to move particles at uniform speed v, so if Ko(a — 
r,t) is the propagator for the standard diffusion equation (corresponding to the 
case v = 0), the propagator with advection included is expected to be K(x — xo, t) = 
Ko(x — xo — vt,t), that is, 


K(x — x0, t) = eee , 


1 
Cx = 
aes ees | 4Dt 


(b) The Fourier transform of the advection—diffusion equation (as quoted in the 
question) with respect to z is 


< ah t) = (—ikuv — Dk?) é(k, t), 


which has the solution 

é(k,t) = exp[—(Dk? + ikv)t] é(k, 0). 
Following the argument presented in Section 4.2, the Fourier transform of the prop- 
agator is therefore 


K(k, t) = exp(—Dtk?) exp(—ivtk). 


l 
V 27 
This is the same as for the usual diffusion equation (discussed in Section 4.2), with 
the additional factor exp(—ikvt). From the result of Block I, Chapter 3, Example 
3.6, we see that this corresponds to a translation of the inverse Fourier transform 
through a displacement of vt, leading to the expression for K(x — 29,t) quoted in 


Solutions to Exercises in Chapter 4 


part (a). (Alternatively, we can use the result of Exercise 3.18, which gives the 
inverse Fourier transform of exp(—ak*) exp(ik.X ) directly.) 


Solution 4.20 


Applying the method of separation of variables, writing (x,t) — 0, = A@(x,t) = 
f(x)g(t) and substituting into the partial differential equation, we find that g'f = 
Dgf"” — Rfg. Dividing by fg and rearranging gives 


g! // 
f 
as ee 
g 2 


where —AD is the separation constant, which we assume should be non-positive. 


ees: 2 


Solving the equation for f(a), and identifying solutions satisfying the boundary 
condition (0/0xA@)(0,t) = (0/0xA6)(L,t) = 0, we find normalised eigenfunctions 
which are identical to those obtained when R = 0, that is, 


[2 — dn 
k= 7 * cos(nmaz/L), i oe on 


so \ = (nr/L)*. The orthogonality relation is unchanged. 


The equation for g(t) is solved by g(t) = exp|—(AD + R)t]. The general solution of 
the partial differential equation is 


om 
GO ¥;4) = 7%, +m y* - cos (naa /L) exp | (+ — 7D) ) 


Using the orthogonality relation (or equation (4.20)), the Fourier coefficients a, are 


of Poti wei eo iol Olean 0 caine aaa: 


Note that this solution was performed using normalised eigenfunctions (as described 
in Subsection 4.4.3). An alternative approach is to use the method set out in Section 
4.3, that is, with un-normalised eigenfunctions, cos(n7xz/L), so that a, in both of 
the above equations can be replaced by An./L/(2 — dno). 


Solution 4.21 


Let Ad = 6 — 86, be the difference between the temperature of the bar and the 
surrounding air. The steady-state solutions, satisfying 0A@/Ot = 0, also satisfy the 
differential equation 


d? A@ o vat 
ae D) 
which has solutions 


A(x) = Acosh CE :) + Bsinh CE :) , 


where A and B are constants which must be chosen to make the solution satisfy 
the boundary conditions. The required solution satisfies 0; = 0(0) = @(L), so 


IR [R 
0-0, = A= Aeon uty + Bs ( it). 


implying that A = 6, — @, and 


IR IR 
Bain ( zt + (6; | zt =—t, = f.. 


The steady-state temperature is therefore 


wt) =, +(0,- 49 fam ( fF 2) smn (2) 


Ad, 


15! 


152 Chapter 4 Solutions of the diffusion equation 


where 


as 1 — cosh(,/R/DL) 
~ ginh(,/R/DL) — 


Solution 4.22 


Here we must solve the Helmholtz equation (V? + A) f = 0 in three dimensions, for 
a cubic region and with Dirichlet boundary conditions. The solution is obtained 
by separation of variables, writing f(z, y,z) = f(x) fo(y)f3(z). The approach is 
analogous to the treatment of the rectangular region with a Dirichlet boundary 
condition in two dimensions, given in Exercise 4.14. If the cube is positioned so 
that its sides are in the planes 2 = 0,c4=L,y=0,9 = L,2=0)2 = L, the sohittion 
is a product of sine functions, such as sin(n,7x/L), with n, a positive integer. The 
normalised eigenfunctions are 


) S/2 
oe (2m, Y, Z) ies (=) sin (=) sin (=) sin (=) ; 


with eigenvalues 


2 2 2 
as 7a ("1 + ng +3). 
The general solution is (using equation (4.53)) 
O(r,t) = > ° anfn(r) exp(—?|n|? Dt/L?), 
Te 


where n = (n1,N2,n3) and r = (x,y,z), and the a, are Fourier coefficients which 
are chosen to make this solution match the initial temperature distribution at t = 
0. (The summation is over all triples of positive integers n = (n1,n2,n3).) The 
eigenfunctions are orthonormal on the cube, and the Fourier coefficients are 


om =f dV O0(r) fn (1). 
V 


The initial condition is that 09(r) is constant throughout the cube, so the Fourier 
coefficients are 


en = 6 | av fal?) 


_" (+) a / eae / 7 / eee psp. 
e (x) [1 — (=1)™ Jf = (-1)"][1 — (-1)"5] 


14nNgnNg 


? 


using 


: =f eae 
/ dx sininy at / LL) = = cos(maz/L)] = a — (-1)™] 
and similarly for the integrals in y and z. Also, at position r. = (SL, sL, +L), the 
eigenfunctions take the values 


fal?a) = CIE) le sin(n;7/2) sin(ng7/2) sin(n37/2). 


Note that fn(r-) = 0 unless n1, ng, n3 are all odd, in which case the factors 
[1 — (—1)™] are all equal to two. Inserting the Fourier coefficients and the values 
of fn(r-) into the general solution gives the result quoted in the exercise. 


CHAPTER 5 
The central limit theorem 


5.1 Introduction 


Up to now this block has discussed two topics which might at first sight 
appear to be unrelated: Chapter 2 discussed the statistical theory of ran- 
dom walks, and Chapters 3 and 4 considered diffusion processes and the 
flow of heat. There have already been indications that random walks and 
diffusion are closely related: for example, the result that the second moment 
of a probability distribution for these processes increases linearly with time, 
(X?) = 2Dt, appeared in Chapter 2 (for a situation where D = 5, Section 
2.4) and in Chapter 3 (Section 3.7). Chapter 6 will show that if a large 
number of particles are independently undergoing random walks, their con- 
centration satisfies the diffusion equation. This chapter discusses a result in 
statistics known as the central lamit theorem which will be used in Chapter 
6 and which is very useful in its own right. 


You have already seen (in Chapter 2, section 2.6) that the probability distri- 
bution for a simple random walk may be well approximated by a Gaussian 
function when the number of steps is large. The central limit theorem (in 
the form described here) is an extension of this result to the case in which 
the random variable has a continuous (as opposed to discrete) distribution. 
The theorem shows that, under quite general conditions, the probability 
density of a sum of N random variables approaches a Gaussian probability 
density function when N is large. It is a very important result in proba- 
bility theory, because it helps to explain why Gaussian distributions are so 
commonly observed. In the context of random walks, the random variables 
which are summed are the displacements at each step. 


Chapter 6 will consider the concentration of a large number of particles 
which follow independent random walks. The central limit theorem will be 
used there to establish that the concentration obeys the diffusion equation 
in systems without boundaries. A more difficult approach will also be used 
there to derive the diffusion equation directly in systems with boundaries. 


Both this chapter and Chapter 6 will contain fewer exercises. The results 
which are derived are very powerful, but the derivations are difficult com- 
pared to those in most other parts of the course. If you find the derivations 
challenging, you should concentrate on the exercises which illustrate applica- 
tions of the results. The assessment relating to this chapter will concentrate 
on testing your appreciation of how to apply the central limit theorem, 
rather than its derivation. The material in Sections 5.4 and 5.5 will not be 
assessed. 


Some of the exercises in this chapter use proof by induction in their solution. 
This technique will not be required in the assessment of the course, and if 
you are not familiar with this approach, you should turn directly to the 
solutions to these exercises, where the method is explained. 


154 Chapter 5 The central limit theorem 


This chapter relies heavily on Fourier transforms and the convolution theo- 
rem. You might find it useful to review Section 3.5 of Block I, Chapter 3, 
now. 


5.2 The central limit theorem 


Many probability density distributions occurring in practical problems are 
found to be very well approximated by normal distributions. In most cases 
this arises because the quantity of interest is the sum of a large number 
of random variables. It will be shown that these sums typically have a 
probability density which is closely approximated by a Gaussian function. 


More precisely, we have the following. Consider a variable X which is the 
sum of N random variables 2;: 


N 
A= ae (5.1) 
$= 


For simplicity, we assume that these variables are independent, and that 
they all have the same probability density, given by the function p(2;). 
Both of these assumptions can be relaxed, to a certain extent. Some of the 
arguments will assume certain properties of the function p(x). For now, 
think of p(x) as being a smooth function which decreases very rapidly in the 
limits as 2 — +oo. We shall see later that the only conditions that must be 
imposed on p(x) are that the mean (a) and second moment (x) both exist. 


We use the symbol py (X) to denote the probability density for the sum X 
(using the subscript X to distinguish this probability density from that of 
each component of the sum, p(x)). We aim to show that in the limit as N > 
oo, the probability density p,(X) of X approaches a normal distribution 


1 
phx) 
1] Oe 


The mean (X) and variance 0% of this distribution are readily expressed in 
terms of the mean (x) and variance o? of the x;: 


(X) = N(x), (5.3) 


exp[—(X — (X))*/20%]. (5.2) 


= N (a — (x))?) = No“. (5.4) 


The result contained in equations (5.2), (5.3) and (5.4) is known as the 
central limit theorem. The limiting probability density (5.2) is independent 
of the form of the probability density of each component of X (provided 
that the mean (x) and variance o* of the probability density p(x) exist). 
This insensitivity to the form of p(a) makes the central limit theorem a very 
powerful result, and perhaps a rather surprising one. How can it be that 
the form of p(x) becomes irrelevant as we add more variables? 


As an illustration of the central limit theorem, in Figure 5.1 we display 
graphs of the probability densities for sums of N random variables with 
uniform probability density on the interval [0,1], up to N = 5. For N = 1, 
the probability density is a top-hat function, and for N = 2 you will see 
shortly (in Exercise 5.7) that the probability density is a function with a 
triangular graph. As N increases, the curves rapidly look more and more 
like Gaussian functions: a Gaussian probability density with the correct 


5.2 The central limit theorem 


-s 


mean and variance (5 and ;5, respectively) is plotted for comparison with 


2 
the N = 5 case. 


= 
I 
= 
S 
= 
AN 
= 
[ 
i 


Gaussian 


0.0 
0.0 1.0 2.0 3.0 4.0 * 


Figure 5.1 Illustrating the central limit theorem. The probability density, py, of 
the sum of N random variables, with each random variable having a uniform 
distribution, p(x) = 1, on the interval [0,1], is plotted for N = 1,2,3,4,5. The 
probability densities approach Gaussian functions as N — oo. For N = 5, the 
Gaussian approximation is already very close to the exact probability density. 


You have already seen a closely related result, for the case of a random variable 
which takes a discrete set of values, as opposed to the continuous case consid- 
ered in this chapter. The displacement of a random walk is the sum of many 
independent, identically distributed steps. It was shown in Chapter 2 (Section 
2.6) that this is well approximated by a normal distribution. There are slight 
differences: the random walk considered in Chapter 2 had a displacement which 
takes only integer values with a specified distribution, whereas the version of 
the central limit theorem stated above applies to a large class of continuous 
distributions. It is therefore a much more general result than that discussed in 
Chapter 2. 


The central limit theorem is plainly a very important result, which deserves 
to be understood in its own right, as well as in relation to the diffusion 
equation. The remaining sections of this chapter will discuss results leading 
up to the derivation of the central limit theorem, but before tackling these 
sections you should consider Example 5.1 and Exercises 5.4 and 5.5, which 
illustrate some of its applications. 


Before tackling exercises on the central limit theorem itself, you should con- 
sider the following preparatory exercises. 


Exercise 5.| 


Let x; and x2 be two independent random variables, with mean values (x1) and 
(x2), respectively, and variances o7 and o3 respectively. What is the mean value of 
X = 21+ 22? Show that the variance of X is 0% = 07 + 03. 


Exercise 5.2 


Consider the sum X y = San x; of N independent random variables x;, all of which 
have the same mean (x) and variance 07. Show that the mean, (Xj), and variance, 
o*,, of Xy are (Xv) = N(x) and o?, = No”, respectively. For convenience in this 
exercise, we have used a slightly different notation: Xj instead of X, and o% in 
place of the more usual 0%. 


Hint: The second part 
requires proof by 
induction. 


155 


156 Chapter 5 The central limit theorem 


Exercise 5.3 


Let x; be a collection of N independent random variables, all having the same 
probability density, which is uniform on [0,1] and zero elsewhere. What are the 
mean and variance of each of the 7;? What are the mean and variance of X = 


poe x4! 


The following example and exercises involve knowing the probability that 
a Gaussian random variable is less than a certain value. This probability 
may be determined from the normal probability function, N(x), which was 
defined by equation (1.55), and tabulated in Table 1.2. It is conventional 
to use the symbol N both for the number of variables in the sum, and for 
the normal distribution function. The latter will always be followed by the 
argument of the function in parentheses, to avoid confusion. 


Example 5.1 


A space satellite is assembled from 480 components. Each of the components 
was weighed accurately, but only the number of grams was recorded, with 
the part of the scale reading giving fractions of a gram being discarded (that 
is, the accurate weight was truncated to a whole number of grams). The 
manufacturer is asked to provide precise information about the weight of 
the completed satellite, but does not want to disrupt the assembly process 
to weigh everything again. 


The sum of the recorded weights of the components was 19.750kg. A clause 
in the contract to build the satellite will penalise the manufacturer if the 
weight exceeds 20 kg. 


(a) By what amount might the weight of the completed satellite be expected 
to exceed the sum of the recorded weights of its components? 


(b) Suggest a reasonable guess as to the probability distribution of the error 
in the weight of an individual component. 


(c) What are the mean and variance of this distribution? 


(d) What is the probability distribution of the amount by which the weight 
of the completed satellite might exceed the sum of the recorded weights 
of its components? 


(e) Should the manufacturer be concerned about the risk that the weight 
will exceed 20 kg? 


Solution 


(a) The fractional part of the weight in grams of each component is a number 
between 0 and 1, about which we have no information. It is natural to 
model the excess weight of the component with index 7 as a random 


number x;, with mean value 5. The excess weight of the completed 


machine is then X = >). , 2;, which has mean value (X) = N(x) = 
A480 x 0.5¢ = 240¢. 


The final weight of the satellite can be regarded as a random variable 
which has mean value 19.750 + 0.240 = 19.990 kg. 


(b) We know that the weighing errors x; (measured in grams) lie between 0 
and 1, but have no other information about them. We therefore assume 
that their probability density p(x) is a uniform distribution between 0 
and 1. 


(c) The mean and variance of this uniform distribution were obtained in 
Exercise 5.3: they are (x) = f and o7 = oF respectively. 


5.2 The central limit theorem 


(d) Using the central limit theorem and the result of Exercise 5.3, we can 
write down a probability density for the excess weight X in grams: 


ae aa | a. 


/2m x 480/12 «| 2x 480x 4 


That is, X is expected to have a normal distribution with mean value 
(X) = 240 and standard deviation ox = ,/? = 2v10. 


(e) The penalty clause will be invoked if X exceeds 20000 — 19750 = 250, 
that is, 10 above the mean value. This is 10/(2V/10) = 10/2 = 1.58... 
multiples of the standard deviation. From Table 1.2 in Chapter 1, giv- 
ing values of the normal distribution function N(x), we see that the 
probability that a Gaussian random variable exceeds the mean by more 
than 1.5 standard deviations is approximately 0.067. (Note, by symme- 
try of the Gaussian distribution, that this probability is identical to the 
probability that the Gaussian random variable is less than the mean by 
more than 1.5 standard deviations, i.e. N(—1.5).) There is a small but 
significant risk that the satellite will be overweight. Ml 


Exercise 5.4 


The probability density for the time t taken for a light bulb to fail is given by the 
Poisson distribution, p(t) = exp(—t/r)/r. (Of course, this applies only for t > 0, 
and is zero for t < 0.) The manufacturer claims that the mean time before failure 
is 100 hours. Use the central limit theorem to determine a good approximation to 
the probability density for the total time for ten bulbs to fail. If ten bulbs have 
failed after a total of 580 hours, would you doubt the manufacturer’s claim? 


[Hint: The times t; for individual bulbs to fail are independent random variables, 
all with the same probability density. Apply the central limit theorem to the sum of 
the times for N bulbs to fail, T = oe t;, and use this to estimate the probability 
density for 10 bulbs failing after time JT. Then determine the probability of 10 
bulbs failing after 580 hours in terms of the normal probability function N().] 


Exercise 5.5 


Coins can be counted by weight. It is assumed that the weights of £1 coins in 
a batch are independent random variables, with mean value 10g and standard 
deviation 0.1g. What are the mean and standard deviation for the weight of a 
batch of coins worth £100? What is the probability density of this weight? 


This method of counting coins by weight is considered acceptable if the probability 
of counting incorrectly is less than 107°. Is this method acceptable for counting 
100 £1 coins? 


If this method is to be used reliably, coins from different sources should be mixed 
before being counted. Why might this precaution be necessary? 


[57 


158 Chapter 5 The central limit theorem 


5.3 Distribution of sums of random 
variables 


Before considering sums of N random variables, let us consider the proba- 
bility density for the sum of just two random variables. Let X be the sum 
of two independent random variables, 7; and x2, with probability densities 
01(%1) and po(x2), respectively. What is the probability density py(X) for 
A =a, 4+ 75! 


We shall quote the result and discuss its structure before giving a derivation. 
The probability density for the sum X = 7; + 22 is 


CO 
px(X) = f day py(11) py(X —m). (5.5) 
=O 
The form of this expression should not be surprising. We can obtain a given 
value of X = x; + x2 with any possible value of x71, but having fixed x, the 
value of x2 is now determined: rg = X — x1. We therefore expect to have 
an expression containing one integral, over 7;. We expect that this should 
contain the probability densities for 7; and x2 = X — 71, hence the factors 


p1(@1) and po(x2) = po(X — 21). 


Note that equation (5.5) indicates that py is the convolution of p, and po, 
as defined in Block I, Chapter 3 (Section 3.5). The probability density for 
the sum of two independent random variables is therefore the convolution 
of their individual probability densities. This may be written in symbolic 
form, using ® to denote the convolution operation, as 


Px = P| @ fo. (5.6) 


You will see that the convolution theorem will prove to be very useful in 
understanding the probability density for a sum of random variables. 


Now we give a derivation of equation (5.5). We consider the element of 
probability 0P that x71 + x2 lies between X — 5OX and X + SOX . Dividing 
OP by the width of the interval, 6X, and taking the limit as 6X — 0, we 
obtain py(X). To calculate dP, we refer to the discussion of the definition 
of the probability density for two random variables, discussed in Chapter 
1, Subsection 1.1.5. The probability that a condition is fulfilled is obtained 
by integrating the probability density over the region where the condition is 
satisfied: see equation (1.19). In this case we have two random variables, x, 
and x2, and the element of probability dP can be obtained by integrating 
their joint probability density, p(x#1, 22), multiplied by a function which is 
unity when X — 5X <a7j,t+942<X+ SOX and zero elsewhere. The region 
where this last function differs from zero is illustrated in Figure 5.2. 


5.3 Distribution of sums of random variables 


Xt ky =A BN 


Xx 


Figure 5.2 ‘The probability that the sum x; + x2 of two random variables lies 
between X — 50X and X + SOX is obtained by integrating their joint probability 
density p(x1,%2) over the shaded strip (which extends to infinity in both 
directions) 


The required function can be obtained from the ‘characteristic function’, 
introduced in Block I, Chapter 3, Section 3.3, equation 3.26. We write 


a / ix, | dx2 P(X1, £2) X15x(@1 Seo ee X), Gere 


where x,(x) is unity if x lies in the interval [—e,«], and zero otherwise, so 
the factor xis, selects those values of 2; and x2 for which the value of 
Pe 


X = 21+ 22 lies in the required interval. The probability density of X is 
determined from equation (5.7) by taking a limit: 
i; dP 
= lim — 
5X0 OX 
1 CO CO 
= wm 5X ” dx} ie dx9 po ys £2) Xi5x(1 +. t= mei (5.8) 
This expression looks complicated, but it turns out that one of the integrals 
can be evaluated very easily. To see how this comes about, consider the 


following function g(X), defined in terms of a given continuous function 
f(x) by a combination of a limit and an integral: 


px(X) 


A(X) = flim se f de fle) xysx(2- X). (5.9) 


A combination of an integral and a limit of this type will be required to 
obtain py(X) from equation (5.8). Using the fact that the characteristic 
function is zero outside the interval [X — 56X,X + SOX |, the integral is 
simplified: 
1 X+56X 
X)= lim — d bulO) 
A(X) = lim se | “gy AC) (5.10) 
Provided that f(x) is continuous at « = X, when 6X is small the function 
f(x) is approximately equal to the constant f(X) over the range of the 
integral in equation (5.10). This integral can therefore be approximated by 
f(X) 6X, so g(X) = f(X): that is, for any function f(x) which is continuous 
at X, we have 


F(X) = flim, se | de xggx(e- X) Fl) (5.11) 


159 


160 Chapter 5 The central limit theorem 


We now apply this result to the integral over x2 in equation (5.8). We change 
the variable of integration x2 to x = 71 + %, so the function p(x, x — 21) 
plays the role of the function f(z) in equation (5.11), and thus 


1 OO 
ory, X= 23) = jim ox [. dx X1§5x (V1 +29 —X) p(x1,%2). (5.12) 


Comparing equation (5.12) and (5.8), we find 


CO 
px (X) =} dx, p(a1,X — 2). [5 13) 
ce, © 

This is a general expression for the probability density of X = 7; + x2 
when x; and x2 have a joint probability density p(21,x72). In the case 
where x7; and x2 are independent, the joint probability density factorises: 
0(@1, 22) = P1(%1)Po(x2). Substituting this expression into equation (5.13), 
the probability density p(X) is seen to be given by the convolution integral 
Cea 


The following exercise shows how equations (5.5) and (5.6) can be extended 
to express the probability density of the sum of N identically distributed 
random variables in terms of the probability density of each variable. 


Exercise 5.6 


Let Xy be the sum of N independent and identically distributed random variables 
x;, each of which has probability density p. Use equation (5.6) to deduce that 
the probability density, px,,, of the random variable Xj is given by repeated 
convolution of the function p with itself, i.e. 


Pxy = PBP@:+-@p. 
\ commen, enemeemenae/ 


N—1 convolutions 


The notion of taking repeated convolution of a function may appear to be a little 
daunting. According to the definition introduced in Block I, Chapter 3, Section 
3.5, the function py, = p ® p is obtained from p(x) by performing a single integral. 
The function px, = px, ®p =p®p®p is obtained from the function py, and p 
by performing a further integration, and so on. However, you will see that the use 
of the convolution theorem enables us to avoid having to perform these multiple 
integrals. 


The following example and exercise illustrate the application of equation 
(ae 


Example 5.2 


Let X = %1 + X2, where x; and x2 are independent Gaussian random vari- 
ables with mean values (1), (x2), and variances 07, 03, respectively. Show 
that X has a Gaussian distribition. Determine its mean value (X) and 
variance o%. (This result is illustrated in Figure 5.3.) 


Deduce that the sum of N independent Gaussian random variables is also a 
Gaussian random variable. Why is the central limit theorem a much stronger 
result than this? 


Hint: Again, this exercise 
requires proof by 
induction. 


5.3 Distribution of sums of random variables 


Solution 


The result can be obtained from equation (5.5) by calculating the convolu- 
tion of two Gaussian functions directly. It can also be obtained by Fourier 
transforming, using the convolution theorem, and evaluating the inverse 
Fourier transform. In this case it is more convenient to follow the latter 
approach, which avoids doing any calculation. 


Observe that each probability distribution is a Gaussian function of the form 
p(x) = Aexp[—(x — ps)?/207], where pz, o* are the mean and variance, and 
A is chosen to normalise the distribution. Now we use results from Block I, 
Chapter 3, in particular the result that the Fourier transform of a Gaussian is 
a Gaussian (Exercise 3.18). We find that the Fourier transform of each prob- 
ability density is a Gaussian of the form f~(k) = aexp(ibk) exp(—k?/2c’), 
where a, 6, c are constants which we need not determine. The probability 
density of the sum X = x; + £2 is the convolution of the probability densi- 
ties of each variable. Using the convolution theorem, the Fourier transform 
of the probability density py of the sum is proportional to the product of the 
Fourier transforms of the probability densities of each variable. We therefore 
have (using equation (3.51) of Block I, Chapter 3) 


Px(k) = V2n x ay exp(ib,k) exp(—k?/2c4) xX a2 exp(ibgk) exp(—k?/2c3) 
= ax exp(ibxk) exp(—k* /2c%,), 


where ax, by and cx are three constants that can be determined in terms of 
a,, b1, cy and ag, b2, co. The formulae for these constants are not significant 
here; the important point is that the Fourier transform of py has the same 
form as the Fourier transform of a Gaussian function. We conclude that 
X is Gaussian distributed. The mean and variance are obtained from the 
results of Exercise 5.1, ie. (X) = (x1) + (x2) and o% = 0% + 0%. 


It follows by induction that the sum of N independent Gaussian random 
variables is Gaussian, and from Exercise 5.2 we see that the mean and 
variance are the sums of, respectively, the means and variances of each 
component. The result is therefore consistent with the central limit theorem. 
It is much weaker however, in the sense that the central limit theorem states 
that the probability density of the sum approaches a Gaussian even in cases 
where the elements of the sum have a non-Gaussian distribution. lH 


p(x) 1.6 P(x) 


1.0 0.0 1.0 2.0 3.0 * 


Figure 5.3 Illustrating the conclusion of Example 5.2: the probability density px 
of the sum of two Gaussian variables x7; and 22 is also a Gaussian. The parameters 


in this example are (x1) = 4, (2) = 3, of = $, 03 = zy, 80 (X) = 2 and o% = Fj. 


161 


162 


Exercise 5.7 


Two independent random variables x; and x2 both take values between 0 and 1 with 
a uniform probability density. Show that the probability density of X = x; + x9 is 
a function with a triangular graph, as illustrated in Figure 5.4. 


p(X) 1 
0.8 
0.6 


0.4 


0.0 1.0 2.0 x 


Figure 5.4 Illustrating the result of Exercise 5.7: this is the probability density py 
of the sum of two random variables, each of which has a uniform distribution 
pe = 1 on the interval (0, 1]. 


You have seen that the probability density of the sum of two independent 
random variables is given by the convolution py = p; ® p>. If you studied 
the solution to Example 5.2, you will have already seen the convolution the- 
orem used to evaluate the convolution of two Gaussian probability densities; 
now we shall consider the application of the convolution theorem in a more 
general context. Let f,(k), po(k) and px(k) be the Fourier transforms of 
the probability densities p,, p. and py, respectively. Applying the convo- 
lution theorem to equation (5.6) shows that the Fourier transforms of these 
probability densities are related by multiplication: 


px (k) = V 2m py (k) po(k). 


Multiplication is a simpler operation than convolution, so this looks like a 
promising approach to understanding the form of the function py(k). 


(5.14) 


If X is the sum of N independent random variables, all having the same 
probability density p, the probability density of X was shown (in Exercise 
5.6) to be 


Px =p@p®-:-@op. (5.15) 
a 


N-—1 convolutions 


By repeated use of the convolution theorem, the Fourier transform of p x is 
Px(k) = (20) YP), (5.16) 


It is certainly much easier to analyse repeated multiplication than repeated 
application of convolution integrals, and we shall now try to determine 
px(X) by using equation (5.16) to determine py(k), and finding the in- 
verse Fourier transform. This suggests that we should investigate the form 
of the Nth power of some function, f(x) say, in the limit where N is large. 
We do this in the next section, and find that (under very general condi- 
tions) [f(x)]* is well approximated by a Gaussian when N > 1. This result 
is the key to understanding the central limit theorem: we know that the 
Fourier transform of a Gaussian function is also a Gaussian function (see 
Exercise 3.18 of Block I, Chapter 3), so py(X) is also expected to be well 
approximated by a Gaussian function. 


Chapter 5 The central limit theorem 


Hint: In this case you will 
find it easier to calculate 
the convolution directly by 
integration, rather than 
through the convolution 
theorem. The solution of 
Exercise 3.22 in Block I, 
Chapter 3, may be useful. 


You might find it useful to 
review the convolution 
theorem at this point: see 
Block I, Chapter 3, Section 
Dos 


For notational convenience, 
we now drop the subscript 
N from Xj in the results 
of Exercise 5.6. 


5.4 The Gaussian approximation to | f(x)| (Optional) 


5.4 The Gaussian approximation to 
‘f(x)}\ (Optional) 


Consider a real-valued function f(x), whose magnitude |f(z)| has a finite 
global maximum at xo (that is, | f(xo)| is the largest value of | f(x)| for any 
real x). We assume that the function is twice-differentiable at x9. The first 
derivative of this function therefore vanishes at xo (that is, f’(aq) = 0), so 
in the vicinity of xp it may be approximated by 


f(z) = fot fq 6x? + O(62°), (5.17) 


where 0x = &— 20, fo = f(Xo) and fj = f" (zo). In the following we also 
assume that the second derivative is non-zero at xo. We consider the form 
of the function [f(x)], where N is a large positive integer. We aim to show 
that this is very well approximated by a Gaussian function (which will be 
obtained as equation (5.25) below). This result is illustrated in Figure 5.5, 
which shows f(x), [f(x)]°, [f(x)]'°, and Gaussian approximations gy(a) to 
the functions [f(x)]‘, for the case where f(x) is a Lorentzian function 


1 
iO 4553 
We see that the Gaussian approximation is very good for N = 3, and almost 
indistinguishable from the function itself for N = 10. 


(5.18) 


f(x) 


of 


Figure 5.5 Graphs of f(x) = 1/(14 27), [f(x)]°, [f(x)]?°, and Gaussian 
approximations to the latter two functions. The Gaussian approximation to 
[f(x)]™ is gn (x) = exp(—N2?). 


Figure 5.6 shows similar graphs for the case where 


f(a) = sine(2) =| z#0 


Again, the Gaussian approximations are seen to be very accurate when N 
is large. 


(5.19) 


163 


The material in this section 
will not be assessed. 


164 Chapter 5 The central limit theorem 


F(x) 


[fooP 


Figure 5.6 Graphs of f(x) = sinc(z), [f(x)]?, [f(x)]®, and Gaussian 
approximations to the latter two functions. Here the Gaussian approximations are 
gn(x) = exp(—Nz?/6). 


We now show that [f(x)]* can be approximated by a Gaussian function 
when JN is large. It is convenient to divide f(x) by its value where |f(z)| 
is a global maximum, so that we consider a function which has a maximum 
value which is unity. We define functions F' and F'y as follows: 


_f@ Se we 
rig = Fao)’ Fuels) = et (5.20) 


By definition, F(x) is equal to unity when x = zo, and |Fiy(x)| < 1 every- 
where except at %g. For any number a when |a| < 1, we have limy_.x aN = 
0, so in the limit as N — oo, Fy (ax) — 0 everywhere except at x = 2p. When 
N > 1, F(a) is approximately zero, except for a very small interval in the 
vicinity of the point x = x9, where Fy(x%9) = 1. We therefore consider in 
detail an approximation to F(a) which is valid in the vicinity of x = zo. 
We write 


Fy (x) = exp (N [In F(2)]), (5.21) 


which is justified in the region of interest around the maximum of |f(x)| at 
xo, where F(x) is positive. We now expand F(x) in powers of 6x = x — 29: 
noting that F’(x9) = 0 because 2g is a maximum. Defining Fy = F (xo) and 
Fo = F'"(xo), we have 


Fy (ax) = exp (N In [Fo + $F 6x? + O(523)]) 
= exp (N In [1 — 3a dx” + O(62°)]) , (5.22) 


since fy = 1, and where a = —F{ is a positive number because F(x) has a 
maximum at x = x with f” (x9) 4 0. We now use the relations 


infil a) Se =o 24 84: 
exp(x + y) = exp(x) exp(y), (5.23) 
exp(a) = 14g es 


to simplify equation (5.22). First we expand the logarithm, then factor 
the exponential, and finally expand the exponential function containing the 
error term: 


F(x) = exp [—5aN 6x? + NO(6z°)| 
= exp (—5aN 62”) exp [NO(52°)] 
= exp (—4ZaN 6x”) [1+ NO(6z°)) . (5.24) 
Thus Fiy(x) may be approximated by a Gaussian function, with a rela- 
tive error which is small when N dz° is small. The Gaussian approxima- 
tion is therefore valid inside a small interval centred on xo, of half-width 


Az ~ N~1/3_ At the boundaries of this interval, however, the Gaussian 
approximation is small, since exp(—5aNAz?) ~ exp(—3ZaN 1/3) approaches 


5.4 The Gaussian approximation to [f(x)|Y (Optional) 


zero in the limit as N — oo. Thus the Gaussian approximation (5.24) is 
valid inside that small interval [zp — Ax, x9 + Az], whereas outside this in- 
terval both the Gaussian function and Fy (x) are approximately zero. We 
conclude that when N > 1, the Gaussian form 


[f(x)]™ = [f(eo)]” exp[-gaN(z — x0)"], (5.25) 


where a = — fi//fo, is a very good approximation for all values of x. 


Exercise 5.8 


Calculate the Gaussian approximations for the functions used in Figures 5.5 and 
5.6, and confirm that your results agree with the functions quoted in the captions. 


[Hint: Because the function sin(x)/x is undefined at 2 = 0, you may find it useful 
to write down the Taylor series for sinz and use this to give the Taylor series of 
sinc(x). The second derivative which is required is easily extracted from this series. 


We can state our conclusions about the validity of the Gaussian approximation 
more formally. Consider the accuracy of the Gaussian approximation (5.24) 
when |dx2| < N", where 7 is some number which we choose for our convenience. 
We shall choose 7 such that the error term in (5.24) becomes small compared 
to unity as N > oo. This means that N‘'T?” < 1 for N > 1, so 14+ 3n <0, ice. 
N<—#. 


The Gaussian function takes the value exp(—saN'*?") when |éz| = N”, and 
this becomes small in the limit as N — oo when 1+ 27> 0, that is, when 
n> —s. Thus we may select —4 <7 = —s, and find that as N — oo, the 
Gaussian approximation is valid even when oz is sufficiently large that the 
Gaussian function exp(—5aN06z”) is very small. 


We shall also need a Gaussian approximation in the case where f is a 
complex-valued function of a real variable x, for which the global maxi- 
mum of |f(x)| is at zo. In this case, the function f(x) itself need not be 
stationary at xg, despite the fact that | f(x)| is stationary there: this point is 
considered in Exercise 5.9 below. Again, it is convenient to divide by f(20), 
and we write F(x) = f(x)/f(xo) in the form 


/ // 
F(a) =1+ #0 bz ga 1Jo dx? + O(dx°). (5.26) 
0 0 


The linear term is purely imaginary, as shown in the following exercise. 


Exercise 5.9 


Show that the condition that |f| is stationary at xp implies that f’/f is purely 
imaginary at that point. 


Now let us consider the form of the Gaussian approximation to Fiy(x2) = 
[F(x)|* when F(z) is the complex-valued function in equation (5.26). We 
can follow quite closely the steps of the earlier calculation for real functions. 
First we write f/ = 16’ fo (where in Exercise 5.9 we saw that @ is real), and 
fi) = —afo (where, in general, a does not have to be real). Then equation 
(5.21) gives 


Fn (x) = exp LN In (1 + (f6/ fo) Ox - +( 6 / fo) 6x + O(dx°))| 
exp [N In (1 + 16’ — 5a dx? + O(6x”))]. (27) 
Now we use the Taylor expansion of the logarithm, from equations (5.23): 


Fy (ax) = exp [iNO'dx — 5Nadz? + 5N0” dx* + NO(6z°)) . (5.28) 


165 


Hint: Write f = Rexp(i0), 
where R and @ are 
real-valued functions of z. 


166 Chapter 5 The central limit theorem 


Finally, we use the results on the exponential function, also in equations 
(5.23), to obtain 


Fy (x) = exp (iNO'5x) exp [—4.N (a — 0’) bx*] }1 + O(Néz”)| . (5.29) 


This resembles the result for real functions, but there is an additional factor 
exp(iN0@'dx), and a is replaced by a — 9”. 


5.5 Fourier transform of a probability 
density (Optional) 


You have seen in Section 5.3 that the probability density py for a sum of 
N identically distributed random variables is given by repeated convolution 
of the probability density for a single variable, p. The convolution theorem 
shows that the Fourier transforms of these distributions are related by equa- 
tion (5.16), that is, py = (27)“—/26%., In the previous section you saw 
that f approaches a Gaussian as NV > oo, under quite general conditions. 
This suggests that py may approach a Gaussian function. Because the in- 
verse Fourier transform of a Gaussian is also a Gaussian function, this would 
lead to a justification of the central limit theorem. In order to apply this, 
it is necessary to characterise the properties of the Fourier transform, p(k), 
of a probability distribution function p(a). We shall need to establish two 
results concerning the Fourier transform of the probability density. First, 
we need to show that the Taylor expansion of p(k) is related to the moments 
of the random variable x. Secondly, we need to show that the magnitude of 
the Fourier transform p(k) has a global maximum at k = 0. 


Because the probability density is normalised, the Fourier transform 


~ 


‘ei =f 


takes the value 1//27 at k =0. By differentiating equation (5.30), we 
establish that derivatives of p(k) at k = 0 are related to the moments of the 
probability density — for example, 


dx exp(—ikz) p(x) (5.30) 


iP 
§O= Tl = TE se | de 0lx) Zexp(—ike) 7 
= [oo * dx x exp(—ikx) p(x) is 
= — = oe 1G oh ee petal (5.31) 
Similarly, we find 
ano) = _= ras (5.32) 


The material in this section 
will not be assessed. 


The concept of 
normalisation of a 
probability density was 
introduced in Subsection 
Las. 


Here the notation f(k)|,—0 
means ‘the function f(k) 
evaluated at k = 0’. 


5.5 Fourier transform of a probability density (Optional) 


Exercise 5.10 


Derive equation (5.32). Write down an expression for (2”) (with n > 0 an integer) 
in terms of derivatives of p(k). 


It can be shown that the magnitude of p(k) is greatest at k = 0. (This 
is expected because p(x) is nowhere negative, and the oscillations of the 
function exp(tka) cause cancellations when k 4 0.) The following exercise 
provides the proof. 


Exercise 5.11 


Show that |A(k)|? has a global maximum at k = 0. 
(Hint: Show that 


POP [P= 5 fae [dy {1 -cos(e(w—y))] ple) oy), (6.33) 


and show that this integral is never negative. 


Note that |p(k)|? = p(k) x [p(k)]*, and express both p(k) and its complex conjugate 
as integrals involving the function p(z).| 


It follows that if p(k) is expanded as a Taylor series about k = 0, the coef- 
ficients are related to the moments of p(x): 


; i: dp , ap 2 
b= RO | a 
pP( ) pP( )+ dk -_ a 2 dk? “-_ 3 
See ee ae (5.34) 


JV 20 


From the result of Exercise 5.10, we know that in general the coefficient in 
k” is proportional to the moment (x”), so 
1 
p(k) = —=[1—ila\k — 3(x?)k? + O(k? Dake 


(provided that the moments (x) and (x*) exist). 


This expression (5.35) is of the same form as (5.26), and we have seen that 
k = 0 is the global maximum of |p(k)|. We may therefore use equation (5.27) 
to give an approximation for [f(k)|Y: we see we must substitute dx = k, 
fo = 1/V2z, 6 = —(x), and a = (x), so, from equation (5.29), we have 
A(R)" = (20) -*? exp(—iN (2)k) 

x exp [—5.N ((x*) — (x)*) k*] [1 + O(NK®)] . (5.36) 
Using equation (5.16), we multiply by (27)“—/? to obtain an approxima- 
tion to py(k), and evaluate the inverse Fourier transform. Noting that most 
of the factors of 27 cancel, we find 


es a apa Nee (5.37) 


ores / ” deexplitt? = wien Area 


— an Be 
(Xion a 
2No? : 


1 
2nNo . | 


(5.38) 


167 


Note that this exercise is 
harder than average. 


Here we have used the 
Fourier transform of the 
Gaussian function, from 
Block I, Chapter 3, 
Exercise 3.18. 


168 Chapter 5 The central limit theorem 


where o” = (x?) — (x)?. This is a normal distribution, with mean and vari- 
ance in agreement with equations (5.3) and (5.4). We shall not discuss the 
error of this approximation. 


We assumed that the moments (x) and (x*) exist. It may be possible to 
extend the series (5.35) to include terms of higher order in k, but in many 
cases the higher moments are infinite. We conclude this section by briefly 
considering the condition which determines whether moments exist. The 
crucial issue is how rapidly the function p(x) decreases as 1 — -too. If p(x) 
decreases too slowly, the integral defining the moment will be divergent. 
The calculations are left to the following exercise, which is quite hard. If 
you are not familiar with manipulating inequalities involving integrals, you 
might like to turn to the solution directly. 


Exercise 5.12 


Consider a random variable which takes only positive values, in the case where the 
probability density can be bounded so that when z is larger than some constant 
LO, 


p(x) <ax*. (5.39) 
Show that the moment (x”) exists provided that G > n+ 1. Conversely, show that 
if 

p(x) > ax? (5.40) 


when x > Zo, the moment (x”) does not exist when n > G—1. 


This exercise shows that continuing to expand p(k) as a Taylor series may 
not be meaningful, because the coefficients are not well defined. However, 
our derivation of the central limit theorem requires only that (x) and (x) 
exist. 


5.6 Summary 


This chapter started by describing the central limit theorem, and illustrating 
its applications. Very often a random variable is the sum of a large num- 
ber of independent influences with similar magnitudes, and the central limit 
theorem indicates that we should expect such a random variable to have a 
probability density which is close to a Gaussian function. This underlies the 
fact that Gaussian probability densities are so commonly encountered that 
they are termed ‘normal distributions’. We illustrated some applications of 
the central limit theorem, in which you are given information about the dis- 
tribution of the component random variables x;, and are asked to determine 
the Gaussian probability density of their sum X. The normal probability 
function N (a), tabulated in Section 1.3, was used to make statements about 
the probability that X will lie inside a given interval. 


We also discussed an approach to explaining how the central limit theorem 
arises. ‘here are several stages to the argument, and it may be helpful to 
summarise these. 


e We showed that the probability density for the sum of two indepen- 


dent random variables, X = x, + £9, is given by the convolution of their 
individual probability densities: py = p, ® po. 


This is another harder 
exercise. 


Hint: The proof involves 
writing (x”) as an integral, 
and splitting the region of 
integration into regions 
where the integrand can be 
bounded. 


5.7 Outcomes 


e The convolution theorem then implies that the Fourier transforms of the 
probability densities are related by multiplication: py = V27 p, po. 


e We extended this to the sum of N random variables x; all having the 
same probability density p(x), the probability density of the sum X = 
ae x; being pxy(X). We found that the Fourier transforms of the 
probability densities are related by py(k) = (2m) N-)/? [p(k]. 

e If f(x) is a function with a global maximum at zo (and satisfying some 
other conditions which usually hold), we showed that in the limit as 
N = ov, [f(x)]* approaches a Gaussian function with its maximum at 
LQ. 

e Combining the previous two results, we concluded that p(k) approaches 
a Gaussian function as N — oo. Because the Fourier transform of a 
Gaussian is also a Gaussian, we concluded that py(X) also approaches 
a Gaussian function. 


At the start of the discussion we did not specify the conditions on the prob- 
ability density p(a). We found that the coefficents of k” in the expansion 
of p(k) are proportional to the moments (z”). We showed that for some 
choices of p(x), the moments might all be infinite after a certain value of n. 
However, our calculation required only that the first two terms of the Taylor 
series expansion of p(k) exist. Thus, the central limit theorem is applicable 
whenever both (x) and (x?) exist. 


5.7 Outcomes 


After reading this chapter, you should: 

e be aware of the scope of the central limit theorem, and be able to identify 
situations where it can be applied; 

e be able to write down the Gaussian approximation for the probability 
density of the sum of independent random variables with the same mean 
and variance; 

e be able to use tables of the normal distribution function N(x) to deter- 
mine the probability that a Gaussian distributed random variable lies in 
a given interval; 

e be aware that the distribution of a sum of independent random vari- 
ables is the convolution of their individual distributions, and be able to 
calculate these convolutions in simple cases; 

e be aware that [f(x)]% approaches a Gaussian function as N > oo, under 
very general conditions, and of how this fact is related to the central limit 
theorem. 


169 


170 Chapter 5 The central limit theorem 


5.8 Further exercises 


The following harder exercises provide an illustration of the central limit 
theorem for the case where the random variables which are summed have 
the probability density 


Aexp(—Azr), x > 0, 
pte) ={ ¢ atin kgs (5.41) 


which describes random intervals between events. This probability density 
was derived in Exercise 1.32, and was also considered in Exercises 1.9, 1.16 
and 5.4. It is called the Poisson distribution. 


Exercise 5.13 


Consider N independent random variables, each one of which has a Poisson dis- 
tribution, with probability density given by equation (5.41). In Exercise 5.4, you 
found the mean and variance of this distribution: replacing rt by 1/A, these are 
(c) = 1/X and (x?) — ((x))? = 1/7, respectively. 


Let px,,(X) be the probability density for the sum Xy of N such random variables. 
Show that this is given exactly by 


ae ebeog 
a —rX), a a. 
x)=) Woo @Pl : a“ 5.42 


[Hint: Write PXyi1 = Pxy @p, and show that if the equation above is valid for 
px, (X), then it is also valid for py, ,(X). Checking that the result is valid for 
N = 1 then implies that it is also valid for all N > 1. This is a further example of 
proof by induction. | 


Exercise 5.14 


Write the exact expression for the probability density py. (X) (valid for X > 0) 
obtained in the previous exercise in the form py, (X) = exp|—@n(X)]. Show that 
the single minimum of @y is at (N —1)/X. Using Stirling’s formula (quoted in 
Section 2.6), show that the Taylor series expansion of ¢y(X) about its minimum 
may be approximated by 


2 hemi 


Use this result to obtain a Gaussian approximation for py, (X). 


on(X) = —Ind + 4 In [2a(N — 1) + oH (x - >) (5.43) 


Exercise 5.15 


Write down a Gaussian approximation to the probability density for the sum Xj 
of N independent random Poisson distributed variables, obtained from the central 
limit theorem. How does this result compare with that obtained in the previous 
exercise? 


Calculate the exact values of px, (X) when A = 1 and N = 10, at X = 10, X = 13 
and X = 20, and compare with the two Gaussian approximations. 


Solutions to Exercises in Chapter 5 171 


Solutions to Exercises in Chapter 5 


Solution 5.1 
In Subsection 1.2.2 it was shown that the mean value of the sum is the sum of the 
mean values, so (X) = (x1) + (a9). 
The second moment of X is 

(X*) = (xy + Qayae + 23) = (a7) + (0G) + 2x1) (x9), 
where we have used the result that, for independent variables, (rv, x72) = (x1) (x2); 
see Exercise 1.21. Recalling the relationship between the variance and the second 
moment given in equation (1.34), the variance of X is 

ox = (X*) — (x)? 

= (x7) + (x3) + 2(ar1) (ae) — ((21) + (a2))? 


= (at) — (#1)? + (2) — (22)? 


Solution 5.2 


Using equation (1.44), we find that the mean value of Xy is the sum of the mean 
values of each of its components: 


Aap ie. 


The variance of Xj requires a more elaborate argument, which is an example of 
proof by induction. If the variance of the sum of N independent random variables 
is 0%, then the second result derived in Exercise 5.1 shows that a 4, = o*, +07, 
because Xy and xy +41 are independent. Thus if the relation a = No? were true, 
we would have o),, = (N+ 1)o*. We have shown that if the relation 2, = No? 
is true for any choice of N, it is also true for N +1. Repeating the argument 


establishes that it is true for any integer greater than or equal to N. By definition 


o? = 0", so the relation 0%, = No? is true for all N > 1. 


Solution 5.3 


The probability density is equal to unity in the interval [0,1], and zero elsewhere 
(you can easily check that this is normalised.) Using equations (1.30) and (1.31), 
the mean and second moment of each x; are 


1 1 
@) = jc 24, (a?) = [ de = = 
0 0 


The variance of each x; is (using equation (1.34)) 
o = (a?) — (v)? = 3— (5) =a 


Using the results of Exercise 5.2, the mean and variance of the sum of N of these 
random variables are (X) = N(x) = N/2 and 0%, = No? = N/12, respectively. 


Solution 5.4 


The probability density is p(t) = exp(—t/r)/7 for t > 0, and zero for t <0. The 
first and second moments are obtained using successive integration by parts: 


C= eS dt ~ exp(-t/1) = f dt exp(—t/r) =, 
GA = -{ dt t? exp(—t/r) 


tet | 
= ar | dt— exp(—t/r) 
0 - 


= Sets 


172 Chapter 5 The central limit theorem 


The question states that the manufacturer claims that (t) = 100 hours, so we should 
take t= 100hours. The variance is o? = (t?) — (t)? = 7*. Assuming that N = 10 
may be regarded as a large enough number to justify applying the central limit 
theorem, the probability density for the time T' for N bulbs to fail is then 


1 (T — Nr)? | 
rie exp | —-————-— |. 
po(D) = eee enw |r 
The probability for N bulbs having failed between J’ = 0 and J’ = 580 hours is 


P(T < 580) = ie dT p(T) =N (Se) — N(-VN) 


~N (=e) (5.44) 


where N(x) is the normal probability distribution defined in Section 1.3, and the 
change of variable « = (T — Nr)/WNr was used. (The final approximation is jus- 
tified by the fact that N(—VN) is negligibly small when N is large.) We find 
that 
580—- Nr _ 580—10x 100 _ Spee 
JN T /10 x 100 


Referring to Table 1.2, we see that N(—1) ~ 0.16 and N(—1.5) ~ 0.067; N(—1.328) 
must lie between these values. Based on the manufacturer’s figure, the probability 
of waiting 580 hours or less for 10 bulbs to fail is therefore greater than 0.067. This 
is not highly improbable, and more observations might be required before disputing 
the manufacturer’s claim. 


Solution 5.5 


If x is the weight of a single coin, the mean value for the weight X = ful seg 
N = 100 coins is (X) = N(x) = 100 x 10g = 1000g. If o? is the variance of the 
mass of an individual coin, the variance of the weight of N coins is No? = 100 x 
(0.1 g)? = 1g?, for which the corresponding standard deviation is 1g. According 
to the central limit theorem, the probability density for the weight of 100 coins, 
expressed in grams, is therefore 
1 (X — 1000) 

The probability for the weight of N = 100 coins being closer to the mean weight of 
99 or less coins than the mean weight of 100 coins is equal to the probability that 
the weight is less than 995g. This is 


1 995 & 2 
PX < 965) = = dX exp (ar 
7 JO 
1 995 = 2 
- =| oe (eae) 
al 20 dies a 


using the change of variable y = X — 1000 in the final step. The approximation 
involved in changing the lower limit of integration is justified by the fact that the 
integrand is negligibly small when X < 0. From Table 1.2 in Section 1.3, we see 
that N(—4) ~ 3 x 107°, and N(—5) will be even smaller. There would be an equal 
probability for the weight to be greater than 1005g. The sum of these is much 
less than 10~°, so this method would be considered sufficiently reliable for counting 
batches of 100 £1 coins. 


= No) 


The coins should preferably be mixed before counting because batches of coins 
reaching the bank from some sources might be more heavily worn. This would 
invalidate the assumption that the weights are independent. 


Solutions to Exercises in Chapter 5 


Solution 5.6 


This is another example of proof by induction. The result was shown to be true for 
N = 2 in the derivation of equation (5.6). Assume that the result is true for the 
sum of N terms, and use the result (5.6) to obtain p ae, 


Px ey = OOP ez, 
=p@pQ@p®---@p 
N-1 convolutions 
=p@p®---@p. 


N convolutions 


Thus we have shown that if the result is true for Xy, it is therefore true for Xj +1, 
for any N > 2. We have already seen that the result is true for N = 2, so it is true 
in general. 


Solution 5.7 


The probability density of each variable is p(x) = x(2x — 1), which is unity on the 
interval from 0 to 1, and zero elsewhere. The probability density of X = x7; + 22 is 
the convolution 


px(X)= f de o(X 2) pc) 
= OS 

It follows directly from this expression, and the fact that p(x) is one on [0,1] and 
zero otherwise, that the probability density py is the length of the x-interval where 
both p(X — x) and p(x) are non-zero. If X < 0, the integral is zero because there is 
no value of x for which both p(x) and p(X — x) are non-zero. For the same reason, 
when X > 2, the integral is zero. Now consider the case when 0 < X <1. In this 
case the x-interval where both p(x) and p(X — x) are non-zero is 0 <a < X, and 
the length of this interval is _X. Finally, consider the case when 1 < X < 2. Here, 
the xz-interval where both p(x) and p(X — x) are non-zero is X —1 <a <1, and its 
length is 2 —_X. Hence we can write 


pe io) <4, 
px(X) = ey tL 2 
0, otherwise. 


Thus the graph of px is as shown in Figure 5.4. We can also write py symbolically 
in other ways: for example, 


p= SOK =D 


Solution 5.8 


For Figure 5.5, the function is f(z) = 1/(1+ 27). The maximum is clearly at 
Zo = 0. We have F(x) = f(x) because f(2z)) = 1. The Gaussian approximation, 
equation (5.25), is gv (x) = exp(—3a.Na’), where a = —F" (0). 


The first two derivatives are F’(x) = —22/(1 +27)? and F” (x) = (6x? — 2)/(1+ 


z*)°, soa = —F" (xo) = 2. The Gaussian approximation is then 
f(a)" & gn(a) = exp(—Nz’). 
Alternatively, we could have noted that (1+ a7)" =1-—a?+---=1-4ar?+--- 


(from equation (5.17)), giving a = 2 directly. 


For Figure 5.6, the function is f(x) = sinc(x). Again, we have xp = 0 and F(x) = 
f(a). In this case, it is not straightforward to calculate derivatives of F(x), because 
sin(x)/ax is undefined at x = 0. Instead, we use the Taylor series expansion of sin x 
to deduce the Taylor series for sinc(x): 


sinc(z) = — go eee et Se 


173 


174 Chapter 5 


Comparing this with the expansion F(x) = 1 — 4ax? + ---, we see that a = 5, and 


2 
the Gaussian approximation is 
[f(x)|" ~ gn (x) = exp(—N2?/6). 


Solution 5.9 


Write f(x) = R(x) exp|i0(x)|, where R > 0 and @ are real-valued functions of 
z. Note that |f|= R, so R’=0 at a maximum of |f|. We have f’ = (R’ + 
10'R) exp(i@), so f’/f = 70’ at a maximum of |f|, because R’ = 0 there. This 
is purely imaginary because @ is a real function. 


Solution 5.10 


Differentiating exp(—ika) twice with respect to k gives —x? exp(—ikx). It follows 
that differentiating equation (5.30) twice with respect to k gives 
fp = 
—=— = dx x* exp(—ikx) p(x). 
Fa = dae [2 exp(- ike) ole) 
Setting k = 0 gives 
1 oe —] 
5, = ey Galt) a” = —— (2°), 
= k=0 21 —oo 20 
which is equation (5.32). Differentiating p(k) n times with respect to k gives a 
factor of (—ix)" in this integral instead of —x?. Multiplying both sides by 271i” 
and setting k = 0 gives 


d”p 
a UVvan 


= V2 i" p\”)(0). 


k=0 


Solution 5.11 
Using equation (5.30), we have 


la(k)|° = [A(k)]* x p(k) = = / © de-exp(ikx) p(a) /  dyexp(—iky) ely 
y: = es ff / dy exp[ik(x — y)] p(x) ply) 
= = 2 dx iz dy cos[k(a — y)] p(x) p(y) 


CO 


eS > ee, dx . dy sin|k(x — y)| p(x) p(y). 


The second term must vanish, because the left-hand side is real. (It can be seen 
why this integral is zero by noting that the integrand has odd symmetry about the 
line y = a.) An expression for |p(0)|? is obtained by setting k = 0 in the above 
expression. Subtracting the expression for |f(k)|* from that of |p(0)|? leads to the 
equation quoted in the hint. Note that cos{k(a — y)] is never greater than one, 
therefore 1 — cos|k(a — y)| can never be negative. The function 1 — cos[k(x — y)] is 
equal to zero for all (a, y) in the case k = 0, but for all other values of k the product 
of the three non-negative functions is positive over a finite area in the (z, y)-plane. 
Hence |(0)|* — |A(&)|? is never negative, and it follows that the maximum of |p(k)|?, 
and hence of |p(k)|, must be at k = 0. 


The central limit theorem 


Solutions to Exercises in Chapter 5 175 


Solution 5.12 


Consider the moment (x”), in the case where p(x) < ax~? for x > x0: 


(x) =| dz p(x) e” =, + Ia, 
0 


where the integrals J; and Jy are contributions to the moment from the two regions 
0 <a <4 and x > 2, respectively. Note that the lower limit of integration may 
be set equal to zero because the random variable is positive, so that p(x) = 0 when 
x <0. These two integrals are bounded as follows: 


LO Xo OO 
b= f depla) ar < ap [av o(e) sag [dr o(x) = 28, 
O O O 


oe) (oe) CT ae a 
h= | dea” pla) i dear”? = —9 


F me B—-n-1 


The final step used in bounding J, is valid provided that n+ 1— (<0. In this 
case (x”) exists, because the integrals J; and Jy have been shown to be finite. 


li. plz) > ax? for x > %o, the integral I> satisfies 


CO 
> a | dx x? 

LO 
and the integral on the right-hand side diverges as the upper limit approaches 
infinity when n — 6 > —1. In the case where n > @ — 1, (x”) is infinite. 


Solution 5.13 


Assume that the distribution for the sum of N such random variables is as quoted 
in the question. The probability distribution for N + 1 Poisson distributed random 
variables can then be obtained by calculating the convolution of the distribution for 
the sum of N Poisson random variables with the distribution for a single Poisson 
random variable: 


(ay 


Note that px,,,(X) =0 for X < 0 because the integrand in the first line will then 
be zero for all x. The second line, valid provided X > 0, used the fact that the 
probability densities are zero for negative values, to restrict the range of integration. 


Setting NV = 0 in this expression, or N = 1 in the expression given in the question, 
we obtain the Poisson distribution itself, so the result for py, (X) is correct for 
N =1. The calculation above shows that if the result is true for N, it is true for 
N +1. The expression given in the question is therefore true for all integers N > 1. 


176 Chapter 5 


Solution 5.14 
Taking logarithms, the function ¢y(X) is 
on(A) = Wile, (4)| = AX = = De HN le = 7, 


with X > 0. The first two derivatives with respect to X are gy (X) =A-(N —- 
1)/X and $4,(X) = (N —1)/X*. The function ¢y(X) is stationary when 0 = 
A —(N —1)/X, ie. at Xp = (N—1)/A. The second derivative at this point is 

(Xo) = 7/(N —1) > 0, so the stationary point is a minimum. The Taylor 
series about the minimum is 


on (X) = bn(Xo) + $(X — Xo)? HN (Xo) + O((AX)”) 


=({ i =i = iin (BE) +e ee) 


+ Wik + InN — 1) + OU AR) 
= =+InA+ 5 In[2x(N — 1)]+ — (x- eS) 


+ O ((AX)°) + O(1/N), 


where AX = X — (N —1)/A, and Stirling’s formula for In (NV — 1)! (see Chapter 2, 
equation (2.33)) was used to simplify the constant term. 


Exponentiating gives a Gaussian approximation for the probability density: 


= r (AX —N +1)? 
Pty (X) & aD | agents 


Solution 5.15 


The mean and variance of X are (X) = N(x) = N/X and 032, = No? = N/)’, re- 
spectively. The central limit theorem is applicable to this problem, and gives the 
following approximation for py, (X): 


r (AX —N | 
Ae exp | —_————— |. 
PX N ( ) JanN | IN 
This is slightly different from the result of the previous exercise: N — 1 is replaced 


by N throughout. The difference between these expressions becomes negligible in 
the limit as N — ov. 


In the following table, p,,...4(X) is given by the formula obtained in Exercise 5.13, 
Pn_—1(X) is the approximation obtained in Exercise 5.14, and py (X) is the approx- 
imation obtained from the central limit theorem above. 


Table 5.1 


10 


0.125 
13 | 0.0661 
20 | 0.00291 | 0.000160 | 0.000850 


The Gaussian approximations are accurate close to the maximum of the probability 
density. Away from the maximum, their relative error is large, although the absolute 
error is small. 


The central limit theorem 


CHAPTER 6 


Microscopic Derivation of the 


Diffusion Equation 


6.1 Introduction 


The material in this chapter will not be assessed, because some of it may be 
conceptually difficult, particularly if you had little contact with the concepts 
of probability and statistics before starting the course. It will be possible to 
gain full marks without having read this chapter. 


This chapter is included because it finally draws together the two strands 
discussed earlier, namely the macroscopic and deterministic description of 
diffusion considered in Chapters 3 and 4, and the microscopic description 
in terms of the random walk model for the motion of molecules, which was 
introduced in Chapter 2, and supported by the discussion of probability 
and statistics in Chapters 1 and 5. Studying this chapter will deepen your 
understanding of random walks and diffusion, and we recommend that you 
read it. 


You have seen that there are close connections between diffusion and random 
walks. Let us start by recalling some of the similarities. In Section 3.7 you 
saw that the diffusion equation 

Oc 02c 


has a solution in the form of a Gaussian function 


N e 

ofa, t) ras exp ( =) (6.2) 
which represents (for t > 0) the concentration coming from N particles, all 
of which are initially located at position x = 0 at time t = 0. As discussed 
in Section 3.7, we can obtain the probability density for the position of a 
single particle by dividing the concentration by the number of particles: 
p(x,t) = c(x,t)/N. The variance of this Gaussian probability density is 
proportional to the time elapsed since the start of the diffusion process: 


(g7\ = T dx x*p(x,t) = 2Dt. (6.3) 


= OO 


Similar expressions occurred in our study of random walks in Chapter 2, 
and these indicate that the random walk is the microscopic process which 
causes diffusion. In Sections 2.5 and 2.7, we discussed the probability for 
a random walk which makes steps +dX or —dX each with probability 5, 


This result was discussed 
in Section 3.7. 


178 Chapter 6 Microscopic Derivation of the Diffusion Equation 


at a sequence of times separated by 67. The probability P(X,7) to reach 
position X at time T’ satisfies 


P(X,T + 6T) = $[P(X — 6X,T) + P(X + 6X,T). (6.4) 


In Section 2.7 it was argued that this equation may be thought of as a dis- 
crete form of the diffusion equation, with diffusion constant D = 6X?/26T. 
Section 2.6 considered an approximate solution of equation (6.4), in the 
case where 6X = 6’ = 1 (so that D= 5). It was shown that the solution of 
equation (6.4), starting with the particle located at X = 0 when T = 0, is 


well approximated by a Gaussian function 


2 
Pa XT Se Xe. 6.5 
app ( ) JonT p( / ) ( ) 
which is of the same form as equation (6.2) if we set D = 5 (apart from the 


multiplying constant). The variance of the probability distribution for the 
simple random walk was shown (Section 2.4) to be 


(X?\=T (6.6) 


when 0X = 07'= 1. Again, this is consistent with the result obtained from 
the diffusion equation, namely equation (6.3), when D = 5. 


The connections between random walks and diffusion are clearly very close, 
because they both have a Gaussian solution with a variance that is propor- 
tional to time. It has already been explained that diffusion results from the 
random motion of molecules, but the derivation of the diffusion equation 
in Chapter 3 did not start from this microscopic viewpoint. In Chapter 3 
(Section 3.5) the diffusion equation was derived from Fick’s law, relating 
the flux density to the concentration gradient, J = —DWVc. This is an intu- 
itively appealing assumption, but it was not justified from any microscopic 
model of the particle motion. In this chapter, we shall obtain the diffusion 
equation starting from the assumption that the diffusing molecules follow 
independent random walks, in which the particles make a very large number 
of very short random steps, as illustrated in Figure 2.4. 


The first task, addressed in Section 6.2, is to consider a generalisation of the 
random walk model which is suitable for modelling the motion of molecules. 
Section 6.3 discusses a derivation of the diffusion equation which is based 
upon the central limit theorem considered in Chapter 5. This derivation is 
applicable only for an infinite medium (that is, a region without boundaries), 
and it is desirable to have a more widely applicable derivation. Section 6.4 
discusses a further generalisation of the random walk, and derives a general 
equation for its probability density. This equation is called the generalised 
diffusion equation or the Fokker—Planck equation. The diffusion equation is 
a special case of the Fokker—Planck equation. 


6.2 Continuous random walks 


6.2 Continuous random walks 


In Chapter 2 we introduced the idea that a molecule moving in a gas un- 
dergoes many collisions with other molecules, illustrated schematically in 
Figure 2.4. ‘The motion of a molecule can be modelled by a random walk, 
but we must extend the definition of a random walk in various ways before 
it can be used to model the motion of molecules. 


First, the collisions between molecules in a gas or liquid are very frequent, 
so the time dt between collisions is very short. In practice, the time between 
collisions is so short (typically 10~'°s, as we shall see in Exercise 6.2) that 
we can take the limit as dt — 0. In this limit, the displacement x(t) of the 
particle becomes a continuous function of the position. We describe the 
limit as dt — 0 in Subsection 6.2.1. Another minor extension of the earlier 
models for random walks which is introduced there is that the displacement 
at each step is a continuous, rather than discrete, random variable. 


The other generalisation of the random walk that is required is its extension 
to three dimensions: this is considered in Subsection 6.2.2. 


6.2.1 The continuous random walk in one 
dimension 


In Chapter 2 (Sections 2.3 and 2.4) we investigated a discrete random walk, 
with the displacement X allowed to take only integer values. ‘The displace- 
ments were changed at times 7’, with time having unit spacing. Here we 
consider a random walk where the displacement Az, at the nth step can 
take a continuous range of values, with probability density p,(Axv). The 
steps are separated by a short time interval dt, so the displacement after 
time ¢ is 
M 
ft) = Pe ais where M = Int(t/ot). (6.7) 


= 1 


It is assumed that the displacements Az, are independent random variables. 
This expression is analogous to equation (2.13) of Chapter 2. Initially, we 
shall consider the case where the mean value of Az, is zero, and where the 
variance of Az,, is independent of the current position of the particle and 
of time, so 


(‘Ae,)=0 and (Ag,{Aet)) =o bape (6.8) 


for some constant o. These relations are analogous to equations (2.3) and 
(2.4) of Chapter 2; here o is the typical size of the displacement at each 
step. Figure 6.1 illustrates several realisations of the random walk described 
by equations (6.7) and (6.8). 


179 


Note that the subscript ‘s’ 
in p, stands for ‘step’. 


Here Int(x) means ‘integer 
part of x’. 


Hete Oy, 18 the 
Kronecker delta symbol. 


180 Chapter 6 Microscopic Derivation of the Diffusion Equation 


x(t) 
4 
ns 
al ahaa Aone d 
RY NO al We : Poan w 
= | “Piya 
a fn Oe) 
i i 


Figure 6.1 Eight realisations of the random walk described by equations (6.7) and 
(6.8). Here o = 0.1, 6t = 0.01, and p, is a Gaussian probability density. 


Following the approach introduced in Chapter 2 (Section 2.4), we describe 
the properties of this random walk by calculating the mean and variance of 
the displacement after time t. Taking the mean value of equation (6.7), and 
using equations (1.44) and (6.8), we see immediately that 


M 
(a y (Aa, = (6.9) 
n=l 
This reflects the fact that the displacement x(t) is equally likely to be pos- 
itive or negative. The typical size of the displacement is understood by 
calculating the mean of [x(t)]*, using equation (6.8): 


M 
ae >. ia Mew —. (6.10) 
a | 


Because M = Int(t/ot), the error in the final approximation is no greater 


than o°. 


We want to use this model to describe the motion of molecules, where the 
time between collisions is very short. We therefore consider random walks 
for which the time step dt is very small, and we shall take the limit as dt — 0, 
so that the position x(t) is a continuous function exhibiting a random walk. 
We want this continuous random walk to describe motion with a given value 
of the diffusion constant D, so we should choose the values of o and dt such 
that ([x(t)]*) = 2Dt. We do this by writing 


o” = 2D bt, (6.11) 


so equation (6.10) becomes equivalent to equation (6.3). Because the error 
in the final step of equation (6.10) is less than o*, and o? is proportional 


6.2 Continuous random walks 


to ot, the error in the final approximate step in equation (6.10) vanishes as 
ot — 0. 


We emphasise that this is a simplified model for the motion of molecules, which 
can be improved upon in various ways (for example, by taking account of the 
fact that the interval between collisions may not be constant). Our objective 
here is to show how the diffusion equation arises from a microscopic model 
of particle motion, and for this purpose our simple model is sufficient. More 
realistic and detailed models confirm that the diffusion equation describes the 
variation of concentration, and also enable the diffusion constant to be calcu- 
lated from first principles. 


In Section 2.5 we considered the probability P(X,T) for the displacement 
of a discrete random walk, and in Section 2.6 it was shown that this is well 
approximated by a Gaussian function. Now let us consider the analogous 
question for our continuous random walk. The displacement x(t) for the 
continuous random walk can take a continuous range of values, so its distri- 
bution must be described by a probability density, which we denote p(z, t). 
In the limit as dt — 0, the number of elements in the sum (6.7) approaches 
infinity, so the central limit theorem may be applicable. The other condi- 
tions of the central limit theorem as stated in Chapter 5 are met: we have 
a sum of independent and identically distributed random variables Az,,, for 
which the mean value (Az,,) = 0 and variance (Ax?) = 2D dt both exist. 
The mean and variance of x(t) were obtained above: we have (x(t)) = 0 
from equation (6.9) and (x(t)*) = 2Dt from equations (6.10) and (6.11). 
Applying the central limit theorem (using equations (5.2) to (5.4)), we see 
that in the limit as dt — 0, the probability density for the displacement x(t) 
is a Gaussian function of x: 


i 2 
i= exp(—2*/4Dt). G12 
p(t) = a exp(—0?/ADY (6.12) 
This is identical in form to equation (6.2) (after dividing by N, so that the 
solution represents a probability density for a diffusing particle). That equa- 
tion was previously shown (Exercise 3.21) to satisfy the diffusion equation. 
Solutions of the form (6.12) were plotted in Figure 3.10. 


6.2.2 Random walks in two and three dimensions 


The motion of molecules diffusing in a gas or liquid occurs in three di- 
mensions. (Figure 2.4 is a two-dimensional schematic illustration of this 
motion.) As well as considering the limiting case where the random steps 
are extremely short, we must also consider how to describe a random walk 
in three dimensions. In this section we state the equations that define the 
random walk in three dimensions. This random walk is used as a model 
for the microscopic motion of particles undergoing diffusion. In Sections 6.3 
and 6.4 it is shown that this model leads to a derivation of the diffusion 
equation. 


In its simplest form (considered in this section), the extension of the defini- 
tion of a random walk to three dimensions is straightforward: each Cartesian 
component of the position of a particle makes an independent random walk. 
Stating this more formally, we can describe a continuous random walk in 
three dimensions as follows. The displacements at the nth time step in the 
directions of the x-, y- and z-axes, Az,, Ay, and Az,, are independent 
random variables, satisfying the following equations: 


(Atn) = (AYn) = (Azn) = 0 
(Ax*) = (Ay?) = (Az?) = 2D ét (6.13) 
At fe, =i Ax, neg Ay, Az.) =o. 


isl 


182 Chapter 6 Microscopic Derivation of the Diffusion Equation 


The first line of the above equations specifies that the mean values of the 
displacements in the three directions are zero. ‘The second line specifies that 
the mean-squared displacements in each direction are all equal to 2D ot, as 
for the one-dimensional case. The third line states that the displacements 
in different directions are all uncorrelated with each other (which is a con- 
sequence of the fact that they are independent). As in Subsection 6.2.1, 
we take the limit as ot — 0 in order to model the very frequent collisions 
experienced by a molecule undergoing diffusion. 


There is an alternative notation which is more compact than equations 
(6.13), and which will be used in preference later in this chapter. Let the po- 
sition of the particle be r(t), where r has components (x1, ©2, 3) = (2, y, 2). 
The vector displacement at time not is Ar,. The components can be la- 
belled by an index 7, so that the displacement may be written 


Ar, = (Agta, Ags, Az, ) = (Aruna ee) (6.14) 


With this notation, equations (6.13) can be expressed more concisely as 
follows. ‘The mean displacements are equal to zero, 


(AF 9) =e ee (6.15) 
and the mean values of products of displacements are 
(Ag; otgaa = 20i;0nmD ot, 4 = hs a >. (6.16) 


The factor dnm 1s the Kronecker delta symbol, indicating that for different 
time steps (n € m), the displacements Az;,, have no correlation. Figure 6.2 
is a schematic illustration of the motion described by equations (6.15) and 
(6.16) in two dimensions. 


Vas 


Figure 6.2 Schematic illustration of the steps of a random walk in two dimensions 


Exercise 6.1 


Taking the limit as dt — 0 such that the diffusion constant D remains fixed, write 
down a probability density for the particle to reach (x,y,z) after time t, starting Hint: Recall that 


at the origin when t = 0. probability densities for 
; independent variables are 
How does this compare with the result of Exercises 3.2 and 3.24 in the two- multiplied. 


dimensional case? 


6.3 From random walks to the diffusion equation 


Exercise 6.2 


The diffusion constant D for carbon dioxide molecules diffusing through air is mea- 
sured to be 1.29 x 107° m?s~!. Between collisions, the molecules move at a speed 
v which is comparable to the speed of sound, roughly 260ms~!. Make rough es- 
timates of the typical distance that the carbon dioxide molecules travel between 
collisions, and of the number of collisions each carbon dioxide molecule experiences 
every second. (Because you are asked for rough estimates, it is sufficient to use 
formulae from Subsection 6.2.1 which apply to a one-dimensional model.) 


6.3 From random walks to the 
diffusion equation 


You have now seen how to extend that definition of the random walk to 
three dimensions, and to the limit where the time step, dt, is taken to zero. 
Now it will be shown how this model for the microscopic motion of molecules 
leads to the diffusion equation. In this section you will be shown a simple 
derivation which uses the central limit theorem, but which applies only when 
the particles move in a region without boundaries. Section 6.4 will give a 
much more general derivation. 


In the following, we assume that individual particles follow independent ran- 
dom walks, similar to that shown schematically in Figure 2.4, and show that 
the concentration of particles satisfies the diffusion equation. For simplicity, 
we consider a situation in which the concentration depends upon only one 
Cartesian coordinate, so that we need to consider only a one-dimensional 
situation; the extension to three dimensions is straightforward. In this sec- 
tion we assume that particles are diffusing in an infinite medium; the more 
difficult case of a medium with boundaries is treated in the next section. 


Consider the concentration of particles that results from N particles being 
released at position 9 at time tp. What is the concentration of particles at 
time t? The probability for any given particle moving into the small interval 
between x and x + dz is 6P = p(x — x0, t — to) dx, where p(Az, At) is the 
probability density for moving a distance Az after time At. If N is very 
large, it follows (from the definition of probability — see Section 1.1) that 
the number of particles in this interval is approximately 


ON = NOP = No(a — x0,t — to) ox. (6.17) 
Using equation (3.5), the concentration in this interval is 
ON 
ce ah= = = Np(x — x,t — to). (6.18) 
ay 


- For a particle executing a random walk, we have already seen that the 
probability density p is the Gaussian function of equation (6.12). In the 
case where the particles are all initially positioned at x9 at time to, the 
concentration is (for t > to) 


eet = 


| (x “=F r0)* | 
exp | — ———_ |. 
An D(t — to) 4D(t — to) 
This expression is in exact agreement with a solution of the diffusion equa- 
tion, namely equation (3.57). This establishes that the concentration of 


(6.19) 


183 


Hint: ‘The typical distance 
between collisions is 

o ~ vot. The values of a 
and dot are related to the 
diffusion constant D 
through equation (6.11). 


184 Chapter 6 Microscopic Derivation of the Diffusion Equation 


particles undergoing independent random walks obeys the diffusion equa- 
tion, for the case where all of the particles start at the same position, and 
diffuse in an infinite medium. 


If the particles do not all start at the same position, the argument can easily 
be generalised. Let us assume that the N particles have initial concentration 
c(x, to) at time to. The number of particles initially in the short interval from 
xo to Xp + 62 is C(x, to) 6x9. The number of these particles which reach the 
short interval from x to x + 6x at time t (with t > to) is given by substituting 
c(x9,to) 6x9 for N in equation (6.17): this number is 


ON(x0) = p(x — X0,t — to) dx Xx c(x0, to) 620. (6.20) 


To determine the total number of particles in the interval from x to x + 0x 
at time t, we must sum contributions of the form (6.20) coming from a set 
of short intervals covering the whole line (see Figure 6.3). If we take the nth 
interval to be [n é6x%g, (n + 1) 6x0], the number of particles in the interval of 
width dx at position x is 


5N= S~ 6M(né20) 


a > pla = nig, P19) 62 * cl bX, 16) 6x0. (6.21) 


+ 27 ee time ¢ 
x 
time Ly 
s 
Bio cee OX, 


Figure 6.3 The number of particles in the interval [x, x + 62] at time t is the sum 
of contributions from all of the intervals |xo, 7 + do] at time to 


In the limit as 6x9 — 0, this becomes an integral 


ON = al dxy p(x — Xo, t — to) c(Xo, to). (6.22) 
To determine the concentration at x, we divide both sides by 6x as before. 
Now we use the Gaussian form for p(x — xp, t — tg) (equation (6.12)) obtained 
from the central limit theorem. We find 


aot) 


i Tr Cs = tg)? 

= JaaDO a te) E dxy exp | ID(t— =| C(x, to). (6.23) 
The only element on the right-hand side which depends upon z and t is 
the Gaussian function exp[—(x — x29)?/4D(t — to)|/./47D(t — to). We have 
already seen that this function satisfies the diffusion equation. Provided 
that c(ag,to) is a well-behaved function of xo, the operations of partial dif- 
ferentiation can be carried inside the integral, and it follows that equation 
(6.23) also satisfies the diffusion equation. Note that this solution was dis- 
cussed earlier, where it occurred (with tg = 0) as equations (4.7) and (4.8). 
The difference is that here it was obtained directly from the random walk 
model of the microscopic particle motion, whereas earlier it was obtained by 
solving the diffusion equation using the convolution theorem. 


At this point it will be useful to review this derivation of the diffusion equa- 
tion. We started from the assumption that diffusing particles are following 


6.4 The Fokker—Planck equation 


independent random walks. We used the central limit theorem to show 
that the probability density for the position of any one particle is a Gaus- 
sian function. We then used this to obtain the concentration c(x,t) in the 
form of equation (6.23), and noted that this expression satisfies the diffu- 
sion equation, because the Gaussian factor is observed to satisfy the diffusion 
equation. 


This derivation of the diffusion equation is valid, but a little unsatisfying in 
two respects. First, it does not show that random walks lead to the diffusion 
equation directly, but rather constructs an expression for the concentration 
which is observed to satisfy the diffusion equation. Secondly, this deriva- 
tion is applicable only in an infinite medium, because the Gaussian solution 
is valid only in that case. The next section will discuss generalisations of 
the continuous random walk model for diffusing particles, which can take 
account of situations where the medium in which molecules diffuse is not 
homogeneous. It will lead to a generalised diffusion equation for the proba- 
bility density, proceeding directly from the generalised random walk model. 
This alternative derivation will also be valid for diffusion in a finite medium. 


6.4 The Fokker—Planck equation 


The derivation of the diffusion equation in Section 6.3 started from the as- 
sumption that the particles undergo a continuous random walk, described by 
equation (6.7), with the statistics of the displacements specified by equation 
(6.8). Various extensions of this model arise in studying processes involv- 
ing random motion. Further generalisations of the diffusion process will be 
introduced in Subsection 6.4.1. The probability density for such processes 
can be shown to satisfy a generalisation of the diffusion equation called the 
Fokker—Planck equation. This equation will be derived in Subsections 6.4.2 
and 6.4.3. 


6.4.1 Generalised diffusion processes 


Consider a situation in which the particles are drifting with velocity v as well 
as making random steps. In this case the mean value of the displacement 
during the short time step ot is 


ae = or. (6.24) 


We might also consider situations in which both the mean and variance of 
the displacement are functions of the current position of the particle. In 
general, we will assume that the displacement Ax = x(t + dt) — x(t) of a 
particle is a random variable, with mean and variance given respectively by 


(Ag) == a tt) d or (6.25) 
and 
Be) = (ae = 2 ee). et. (6.26) 


Equations (6.25) and (6.26) will be taken as definitions of the drift velocity 
u(x,t) and diffusion coefficient D(x,t) in cases where these may depend 
upon position and/or time. (We emphasise that u(,t) in equation (6.25) is 
not the randomly varying velocity which causes diffusion, but an additional 
velocity which depends smoothly on time.) In the next two subsections we 
shall develop an extension of the diffusion equation for this more general 


185 


186 Chapter 6 Microscopic Derivation of the Diffusion Equation 


type of random walk; this generalisation is known as the Fokker—Planck 
equation. 


6.4.2 Probability density for generalised diffusion 


The generalised random walk described by equations (6.25) and (6.26) can 
also be described by calculating the probability density p(x, t) for the particle 
to reach position x at time t. In the following, we will derive a partial 
differential equation for p(x,t), the generalised diffusion equation or Fokker- 
Planck equation. The derivation will be one of the hardest parts of this 
course, but the result, equation (6.41), is easily applied to obtain a partial 
differential equation for p(x,t). These partial differential equations can often 
be solved by the methods discussed in Chapter 4. 


The approach is to determine the probability density at time t + dt, and to 
obtain an expression for 0p/0t: 
Op _ p(xz,t + dt) — p(z,t) 


2 ae : 2 
= = + O(6t) (6.27) 


This results in an expression for 0p/Ot which is in the form of a generalisation 
of the diffusion equation. 


Let the probability density be p(z,t), so that the probability of a particle 

being located in a small interval between x and x+ 6x at time t is 6P = 

p(z,t) dx. At this stage, we assume that p(xz,t) is normalised, and that 

it may be differentiated as many times as are required for the subsequent 

calculations. At each time step, separated by dt, the particle jumps a random 

distance z. The displacement z is a random variable with probability density 

p;- (Note that earlier we used the Ax to represent the random displacement, As before, the subscript ‘s’ 
but it is convenient to change to using the single symbol z from here on.) _—_ denotes ‘step’. 


This probability density may be a function of the position « of the particle 
before the jump and of time t, and is written p,(z,z,t). Thus, the probability 
of a particle initially at position x at time t being found in a small interval 
between x +z and ++ z+ 6z at time t + ot is p,(z, x,t) dz. 


The value of the displacement z at time t is independent of all of the previous 
steps. The probability of the particle reaching the interval [x, 2 + dz] at time 
t+ ot, having been in the interval [29,79 + dx] at time t, is therefore the 
product of the probability for being in the first interval and the probability 
of jumping from the first interval to the second. This probability is 


dP(xo) = p,(@ — £0, 20, t) 0x Xx p(Xo, t) 0X9. (6.28) 


Let us now obtain an equation for the probability density at time ¢ + dt, in 
terms of the probability density p(xo,t) at time t. The range of xo is divided 
into small intervals of width d%9. The probability of reaching the interval 
lx, x + dx] at time t + dt is equal to the sum of the probabilities for reaching 
this interval from all of the intervals [n 6x9, (n +1) 6x9]. This is illustrated 
in Figure 6.4. 


6.4 The Fokker—Planck equation 


io ee time t + dt 


ee _——————————————— SSS 
tg Hat OX x 


SSS 
Z~X—Xpo 


Figure 6.4 The probability to be in the interval |x, x + dx] at time t + ot is 
obtained by summing contributions from particles in all of the intervals 

[x0,%o + 6X9] at time ¢t. The probability density for making a jump of z = x — x 
from position x at time t to position x at time t + dt is p,(z, x0, t). 


Summing contributions in the form of equation (6.28), the probability of 
being in [x,x + dz] at time t + dt is 


p(x,t + dt)dx = 6x » 6x9 p(x — N 6x9, Xo, t) p(n 6x9, t). (6.29) 


N=— CO 


In the limit as 6% — 0, the sum becomes an integral over x9. Dividing both 
sides by dx and taking the limit as 62 — 0 and d29 — 0 gives 


CO 
p(z,t+ ot) = / dxp p,(% — Xo, Xo, t)p(xo, t). (6.30) 

Of 
This equation and its derivation are analogous to that of equation (6.22), 
and also to the calculation in Section 5.3. In this case the difference between 
the initial and final times is small, but the distribution of the step z = x — x9 
need not be Gaussian, and may depend upon the initial position, xg, of the 

particle. 


6.4.3 Derivation of the Fokker—Planck equation 


The objective is now to use equation (6.30) to determine a differential equa- 
tion for the probability density p(x,t); we expect that this will be some 
generalisation of the diffusion equation. 


The function p,(z,x,t), considered as a function of z for any fixed value 
of x and t, is very sharply peaked about z = 0, reflecting the fact that the 
probability of making a long jump (significantly longer than VD ot) must be 
very small if the statistics of z satisfy (z*) = 2D dt, as implied by equation 
(6.26). The probability density p,(« — x0, 0, t) for reaching x at time t + dt 
starting from 2p at time t, and the probability density p(x, t) are illustrated 
schematically in Figure 6.5. 


187 


188 Chapter 6 


P(x ii Xos Xo» t) 


(D6t)'” 


x 
Xo 


Figure 6.5 ‘The probability density p,(a2 — 20, 20,t) for the step from zo at time t 
to x at t+ ot is very sharply peaked around xo (the width of the peak is 
approximately /D dt). The probability density p(x,t) at time t is relatively 
slowly varying. 


It is convenient to change the variable of integration in equation (6.30) to 
Z=X-—2o, so that the dominant contribution to the integral comes from 
the region close to z = 0. Noting that z9 = x — z, we have 


pet +o = / 


sae, 2) 


CO 

dz p,(z,x2 — z,t)p(x — z,t). (6.31) 
We now consider how to approximate this expression, leading to a differential 
equation for p(z,t). Given that the dominant contribution to the integral 
comes from the region around z = 0, we can approximate p(x — z,t) by a 


Taylor series: 


(6.32) 


We also make a Taylor series expansion of p,(z,x — z,t) in its second argu- 
ment only: we write 


Ops 
Ox 


(z, x,t) z* + O(z°), 


ALee — #4) =e 2250) = ea Paes 


O° pg 
Ox? 
where it should be understood that the partial derivatives here are with 
respect to the second argument. Note that this expression is rather different 
in structure from equation (6.32) in that the coefficients of the Taylor series 
are themselves functions of z, which approach zero as |z| — oo. We can in 
fact combine both of these Taylor series together, and write 


42 (6.33) 


ae a 
pales — 2,t)p(a—2,t)= > PF ip(z,2,t)pla,t)] 24, (6.34) 
k=0 


which is a ‘Taylor series for the function p,(z,x — €,t) p(a — €,t) expanded in 
powers of €, which is then set to € = z. We now substitute this into equation 
(6.31), and obtain 


OO k 


of 4x 
p(x,t+ ot) = a on / dz wok [e,(z, x, t)p(x, t)] 2” 


= Teer o(c,t) f de pie Oe |. (6.35) 


Microscopic Derivation of the Diffusion Equation 


As indicated in Figure 6.5, 
close to z = 0, 

p,(2,2 — Z,t) is rapidly 
varying in its first 
argument but should be 
slowly varying in z from its 
second argument. For this 
reason, 9, is expanded in 
its second argument only. 


In this second line we took 
everything that is 
independent of z outside 
the integral. 


6.4 The Fokker—Planck equation 


We can now define moments M;(x,t) of the probability density p,(z, x,t) 
of displacements z. By analogy with equation (1.31), these moments are 
defined by 


Met) = / dz z"p,(z, x,t). (6.36) 
With this definition, equation (6.35) becomes 
p(xz,t+ ot) = s ay PCM alc eae (6.37) 
7 2s Kk! Oak : 


The k = 0 moment is unity because p,(z,x,t) is normalised. The first and 
second moments are obtained from equations (6.25) and (6.26), with Ax 
replaced by z. It follows from equation (6.26) that the magnitude of the 
typical displacement is O(ét!/?). We will assume that all of the moments 
M,(z,t) are finite (that is, the integrals (6.36) are convergent), so that 
me i for k > 2. We therefore have 


Mal2,t) = 

Mi x,t) = 7 t) ot 

Mo(x,t) = 2D(a, t) dt + O(6t?) 

M,,(x,t) = O(6t*/?), (k > 2). (6.38) 


Substituting these expressions into equation (6.37), and dropping terms in 
higher powers of the small quantity ot, we obtain 


p(x, t + dt) = p(x,t) — = lu(x,t) p(x, t)| ot 
2 


+32, eD(@,0) p(e.t)] 6 + O(60”) (6.39) 


From a Taylor expansion of p(x,t), equation (6.27), we have 


“P(a, t) dt + O(6t?). (6.40) 


We now take the limit as é6t — 0. Equating the terms proportional to o¢ in 
equations (6.39) and (6.40) gives 
2 
2b = 2S f(a, t) 6] + 5 [D(2,t) 0). (6.41) 
This equation will be termed the generalised diffusion equation. In many 
texts it is called the Fokker—Planck equation. It reduces to the standard 
diffusion equation when v = 0 and when D is independent of x and t. 


p(x,t + dt) = p(a,t) + 


The derivation above gives a direct route to obtaining the standard diffusion 
equation, as opposed to the indirect route discussed in Section 6.3, which 
constructs a solution and then observes that it satisfies the diffusion equa- 
tion. The construction used in this subsection has the advantage that it is 
‘local’, in the sense that p(xz,t + dt) only depends on p(2’,t) for values of 2’ 
which are close to x. The form of the resulting differential equation in the 
vicinity of x therefore depends on properties of the moments M;(z,t) at a, 
and is not affected by the boundaries of the system. The derivation given 
in this subsection is therefore valid for finite, as well as infinite, systems. 


189 


190 Chapter 6 Microscopic Derivation of the Diffusion Equation 


6.4.4 The Fokker—Planck equation in three 
dimensions 


The calculation leading to the generalised diffusion equation was carried out 
in one space dimension. To understand the form of the generalised diffusion 
equation in three dimensions, we must first consider how the definition of 
the generalised random walk, discussed at the start of this section, extends 
to three dimensions. The continuous random walk in three dimensions was 
already introduced in Exercise 6.1 for the case where there is no drift and 
there is a constant diffusion coefficient D. Here we consider the case where 
there may be both drift and diffusion coefficients which depend upon both 
position and time. We also allow for the possibility that the diffusion pro- 
cess 1s not isotropic, that is, that the particles diffuse more easily in some 
directions than in others. 


In three dimensions, the generalised random walk has a vector displacement 
Ar, at the time step labelled by the integer index n. The components can 
be labelled by an index 72, as in Subsection 6.2.2, so that the displacement 
may be written 


Arn = (Cty; See en) = (Ari, Ave, ; Ata). (6.42) 
The mean displacements will, in general, be different in each direction, so 
that 

(Big) SOG) Of, (6.43) 


where the drift velocity u(r,t) = (v1, v2, v3) can be a function of both posi- 
tion r and time t. We write the difference between the displacement es 
and its mean value as 62%; = Az; , —v;dt. The fluctuating parts of the 
displacements 62;,,, in different directions may be correlated. In general, we 
assume that they satisfy 


Ons Oleg) ey Di; ia t) en ot. (6.44) 


The factor dn, is the Kronecker delta symbol, indicating that for different 
time steps (n # m) the displacements 6x; have no correlation. The co- 
efficients D;; are elements of a 3 x 3 matrix which is termed the diffusion 
matrix. 


This three-dimensional version of the generalised random walk can be de- 
scribed by giving the probability density for a particle to be at position 
r at time ¢t. This probability density p(r,t) obeys a partial differential 
equation, which is a three-dimensional version of the generalised diffusion 
equation (6.41). The derivation follows exactly the same pattern as in the 
one-dimensional case, and we shall Soa quote the result: 


--y 2 ie CAG Frome — Dn, Di;(r,t) p). (6.45) 


ei ie Eo | 


Exercise 6.3 


What values of v; and D;; are appropriate for describing a diffusion process which 
is isotropic (occurs at the same rate in all directions), homogeneous (has the same 
properties at all points in space), time-independent, and which is without drift in 
three dimensions? 


Show that substituting these values into equation (6.45) gives the usual three di- 
mensional diffusion equation. 


6.5 Summary and outcomes 191 


Exercise 6.4 


Show that the generalised diffusion equations (6.41) and (6.45) are in the form of The continuity equation 
continuity equations. was discussed in Chapter 


3, where the 
What is the scalar flux density J(x,t) in the one-dimensional case, and the vector one-dimensional and 


flux density J(r,t) in the three-dimensional case? three-dimensional forms 


occur as equations (3.26) 
and (3.37) respectively. 


6.5 Summary and outcomes 


Equation (6.41) gives a partial differential equation defining the probability 
density for any generalised diffusion process in which particles undergo a 
random process that can be described by equations (6.25) and (6.26). The 
derivation was rather complicated, but all the hard work has been done, and 
you can now easily find a partial differential equation for the probability 
density of any generalised diffusion process. 


After studying this chapter, you should: 


e be aware of how the random walk can be generalised to higher dimensions 
and also generalised so that it becomes a continuous function of time; 


e be aware that the central limit theorem implies that the propagator for 
diffusion in an infinite medium is a Gaussian function, and that this 
implies that the concentration obeys the diffusion equation; 

e be aware of the Fokker—Planck equation as a description of generalised 
diffusion processes, and of the method used to derive it. 


We conclude this chapter by giving some examples of generalised diffusion 
processes, introduced through a series of exercises in the final section. 


6.6 Further Exercises 


Exercise 6.5 


Determine the form of the generalised diffusion equation in one dimension which 
applies when both D and v are constant. Show that 


1 —(% — £9 — vt)? 
p(x,t) = ia =a 


is a normalised solution, representing the probability density of a particle which is 
initially at position 9 when t = 0. 


(6.46) 


192 Chapter 6 Microscopic Derivation of the Diffusion Equation 


Exercise 6.6 


Consider the motion of small particles dispersed in a liquid. If the particles are 
denser than the liquid, they will tend to sink to the bottom. If the particles are 
very small, such as the pollen grains used in experiments to demonstrate Brownian 
motion, this sinking effect is opposed by the random motion of the particles which 
results from molecules of the liquid colliding with the particles. The motion of the 
particles is described by the probability density p(z,t) for a particle to have height 
oe imme e. 


The effects of gravity and of the viscosity of the fluid are modelled by assuming that 
the particles drift downwards at a constant velocity v. The effect of the random 
motion of the molecules is modelled by assuming that the particles have a diffusion 
constant D. 


Write down the partial differential equation for the probability density of the Hint: Is this the same as 
diffusing particles. What is the steady-state solution of this equation, satisfying the result from the 
Op/Ot = 0, in the case where the depth of the liquid is h? previous exercise? 


Exercise 6.7 


Consider a situation where a particle (moving in one dimension) experiences a 
restoring force when displaced from the origin, such that a particle at position x 
moves with velocity v = —ax (where a > 0). The particle also experiences ran- 
dom displacements which make it move diffusively, with diffusion constant D. Use 
equation (6.41) to show that the probability density for the particle satisfies 

Op 0 07 p 


Solve this equation for a steady-state probability density in which the particle flux 
J is zero, in a region without boundaries. 


Solutions to Exercises in Chapter 6 193 


Solutions to Exercises in Chapter 6 


Solution 6.1 


Here x, y and z are independent random variables. According to equation (1.20), 
the joint probability density is therefore a product: p(x, y,z) = p1(2) po(y) p3(z), 
where (~,, P2 and pz are probability densities for x, y and z, respectively. The 
statistics of the x-coordinate displacement specified in the question are exactly the 
same as for the one-dimensional continuous random walk treated in Subsection 
6.2.1. Its probability density, for particles starting at x = 0, is given by equation 
(6.12), so 


(e) = ow (FS) 
x) = ——— exp | —— ]. 
fe V4rDt P \ 4Dt 

The statistics of y and z are the same as those of x; accordingly, the probability 
density at time ft is 


(es) = Tas en (a) geo (Gh) genre? Gai) 
,Y,2) = —— exp | —— |] — exp [| —— exp | —— 
ee AnDi °° \ 4b) afd PAR) qf oo 


This is analogous to the expression obtained in the two-dimensional case in Exer- 
cises 3.2 and 3.24, but here the result is obtained via a more direct route. 


Solution 6.2 


Let the typical time between collisions be dt, and let the typical speed of the 
molecules be v = 260ms~'. If we model the path of a molecule as a random walk, 
it is reasonable to suppose that the typical distance travelled between collisions is 
o =vot. Using equation (6.11), we have o* = v*6t? = 2D ét, so dt = 2D/v*. The 
typical distance between collisions is often referred to as the mean free path, and 
given the symbol \. Our simple estimates give \ ~ v dt = 2D/v ~ 107" m. Divid- 
ing by v we have ot ~ 4 x 107!"s, so that the number of collisions per second is 
N =1/6t ~ 2.6 x 10°. Approximations based upon using the central limit theorem 
are therefore expected to be very precise, because JN is so large. 


Solution 6.3 


If there is no drift, we set v = 0. If the system is homogeneous and time-independent, 
the diffusion coefficients D;; are independent of r and t. Because the diffusion is 
isotropic, the diffusion matrix must must not favour any one direction. Taking the 
diffusion matrix to be a multiple of the identity matrix satisfies this requirement, 
so that we set D;; = D0d;; (it can be shown that this is the only choice which makes 
the diffusion process isotropic). With these choices, equation (6.45) does indeed 
simplify to the standard three-dimensional diffusion equation: 


194 Chapter 6 Microscopic Derivation of the Diffusion Equation 


Solution 6.4 


Equation (6.41) is in the form of a one-dimensional continuity equation, 0p/Ot + 
OJ/Ox = 0, with the flux J equal to 


O 
J =vp — —(Dp). 
vp — x (Dp) 
Equation (6.45) is also in the form of a three-dimensional continuity equation 
Op/Ot+V-J = 0, with flux density J having components 


O 
A= 0 — ) ia, ae. 


Solution 6.5 


Since v and D are constants, they can be taken outside the derivatives in equation 
(6.41). The generalised diffusion equation in this case is 
| 0" p 


a oe 


By differentiating the solution quoted in the exercise, it is seen to satisfy this 
differential equation. By integrating over x using a standard Gaussian integral, we 
see that this is a normalised solution, which can represent a probability density. 
By inspection, we see that this solution approaches zero in the limit as t — 0 at 
all points except x = Xo, so the solution represents a motion in which the particle 
starts from position Zo. 


Solution 6.6 


The generalised diffusion equation is the same as for the previous exercise, except 
that the name of the space coordinate is z, and the sign of the velocity v is reversed, 
because positive v corresponds to decreasing z. The partial differential equation 


for p is 
Op  O Op 


In the steady-state, p becomes independent of time so that 0p/0t = 0. The quantity 
in square brackets is therefore independent of both z and t. Comparison with the 
continuity equation (3.26) shows that this quantity is minus the flux density, J. If 
the liquid is in a finite container, it may be assumed that the flux of particles across 
the upper surface of the liquid is zero. The zero-flux condition gives the following 
equation for p(z): 


dp 
de 
This equation has the solution 


p(z) = Aexp(—vz/D), 


O=vp+D 


where A is a constant which is determined by normalising the distribution. If the 
depth of the liquid is h, the normalisation condition is 


h 
AD 
l= / dz 7) = ra — exp(—vh/D)). 
0 
Rearranging this expression to find A, we obtain the required solution 


Capes 
(9) Dir expl—vh/ Dy) 


Solutions to Exercises in Chapter 6 


Solution 6.7 


Substituting v(x) = —ax into equation (6.41) and letting D be constant gives the 
differential equation quoted in the exercise directly. This equation is in the form of 
a continuity equation, with flux density 

Op 

Ox 

When 0p/0t = 0 and J = 0, this has a Gaussian solution. The solution of this 
differential equation was considered in Block I, Chapter 3, Exercise 3.25. The 
normalised probability density is then 


i exp(—aw" /2D) 


a \/ 20D /a 


J = —axp— D 


195 


— ae 
- 9 is —7 od 
: - - - a . 
: ~ a 7 
ee * . 
7 ——— = 7 - 
- a ; 
7 : nae = 
= Py a 
eer 7 7 : 
1 7 - = = os - 
_ - a 


| “@iesob xaell hte’ ON gEpS 


ee oe eeerer nero earn 


=e a i ne alee As eas ee I eet | a 


site = 
oa - 


wv 


: a aie > cer _ =) a 
Mia aa | eee tee ae 7 
- a 


aa 


Index 


INDEX 


approximate form for probability distribution 57 


atoms 76 
average 22 
Avogadro’s number 80 


bias 47 

binomial coefficient 55, 56 

boundary conditions 
one-dimensional 122 

Brown, Robert 50 

Brownian motion 50 


caloric model 96 
central limit theorem 154, 155 
coin tossing 14 
coin-tossing function 44, 61 
statistics 45 
combination of events 15 
combined event 15 
complementary error function 31 
complex-valued function 165 
concentration 78, 79, 86 
of heat energy 95 
concentration of heat energy 95 
continuity equation 87, 88 
one-dimensional 87 
three-dimensional 89 
continuous random variable 17 
probability density 17 
probability density function 17 
continuous random walk 179 
convolution theorem 158 
correlated random function 61 
correlation 26 
correlation coefficient 26, 46 
for discrete variables 26, 27 
correlation function 46, 61 


degenerate eigenvalues 133 
die (dice) throwing 
moments 24 
probability 14-16 
statistics 23 
Diffusion 76 
Ginusion 51, 77 
finite medium 
one dimension 121 
two and three dimensions 129 
generalised 185, 186 
infinite medium 116 
probability density 186 
semi-infinite medium 137 
diffusion coefficient 52, 60 
for generalised diffusion 185 
diffusion constant 75, 93 
diffusion equation 60, 75, 93, 183 
generalised 189 
one-dimensional 75, 93 
three-dimensional 75, 94 
diffusion matrix 190 


diffusion—advection equation 101 
Dirichlet boundary condition 136 
discrete events 13 

discrete random variable 21 


Distribution, sums of random variables 158 


divergence 90 
doubly degenerate 133 
drift 53, 54 
drift velocity 53 
for generalised diffusion 185 


eigenfunctions 123 
completeness 136 
normalised 128, 134 
orthogonality of 126 

eigenvalues 123 

Einstein, Albert 51 

electricity 51 

element of probability 17 

elementary outcome 15 

empirical value 14 

energy 995 

error function 30, 31 
complementary 31 

event 13 
combined 15 
discrete 13 
mutually exclusive 15 

expectation value 23 
for two variables 25 

exponential distribution 19, 21, 24 

exponential function 63 


factorial 56 

Fick’s law 93 

Fick, Adolf E. 93 

flux. 31, 63, 35, 86 

flux density 78, 81, 86 
of heat energy 96 
one-dimensional 8&5 
scalar 82 
vector 83 

Fokker—Planck equation 185, 189 
derivation 187 
in three dimensions 190 


Fourier transform of a probability density 166 


Fourier’s law of heat conduction 96 
frequency of an outcome 13 
friction 77 


gambler’s fortune 51 

gas ol 

Gauss’s theorem 91, 92 

Gaussian approximation 163 
complex-valued function 165 

Gaussian distribution 28, 60 

Gaussian function 28, 30, 31, 63 

Gaussian random variable 160 

generalised diffusion 185 

generalised diffusion equation 189 


198 


generalised Fourier series 134 
generalised random walk 186 


Gibbs, Josiah Willard 51 


heat 96 
heat equation 95, 96 
Heaviside function 120 
Helmholtz equation 130 
eigenvalues and eigenfunctions of 130 
Helmholtz, Hermann von 130 
histogram 28, 29 
homogeneous 96 
homogeneous system 78 


independent events 16 
independent random variables 21 
induction, proof by 155, 171, 173 
initial conditions 122 

integer part function 58 
isotropic 96 


joint probability 21 
joint probability density 20, 27 


Kronecker delta symbol 46 


Laplacian operator 75 
linearly independent functions 133 
Lorentzian function 163 


Mach, Ernst 51 
mass density 90 
mass-specific heat capacity 95 
mean value 23, 27 
of a normal distribution 30 
mol 80 
molar concentration 80 
molecules 76 
moments 
in throwing a die (dice) 24 
of a normal distribution 29 
of arandom variable 24 
mutually exclusive events 15 


Neumann boundary condition 94, 130 
normal distribution 28-32 

mean 30 

moments of 29 

probability density function 30 

standard deviation 30 

variance 30 
normal distribution function 31 
normal probability density function 30 
normalisation condition 18 
normalised function 18 
normalised solution 98 


orthogonality 127 
orthogonality relation 124 
one-dimensional 128 
two- and three-dimensional 134 
orthonormal 128 
outcome 13 


Index 


elementary 15 


parity 56 
Pascal, Blaise 55 
Pascal’s triangle 55 
Perrin, Jean 51 
playing card probabilities 15, 16 
Poisson distribution 34, 157, 170 
price changes 51 
probability 13-22 
in coin tossing 14 
in drawing playing cards 15, 16 
in successive trials 16 
in throwing a die (dice) 14-16 
of combined events 15 
of mutually exclusive events 15 
probability density 17 
exponential distribution 19, 21, 24 
uniform distribution 18 
probability density function 
jot. 20, 27 
of a continuous random variable 17 
of a normal distribution 30 
probability distribution 
approximate form 57 
of arandom walk 54 
propagator 117 


random function 44 
with correlations 61, 62 
random variable 22 
continuous 17 
discrete 21 
independent 21 
two or more 20, 25 
random walk 48, 50, 180 
continuous 179 
generalised 186 
one-dimensional 179 
probability distribution 54 
simple 48, 49 
statistics 52 
three-dimensional 181 
two-dimensional 181, 182 
with drift 53, 54 
rate constant 118 
rate constant (Poisson distribution) 34 
realisation 44 
recurrence relation 55 
relative error 58 
running average 62 


semi-infinite medium 115 
separation constant 122 
separation of variables 122 
simple random walk 48, 49 
sinc 163 
special function 30 
spectrum 130 
standard deviation 24 

of a normal distribution 30 
star motion dl 
statistics 22-28 


Index 


in throwing a die (dice) 
of arandom walk 52 
of the coin-tossing function 45 


steady flow 81, 90 
steady state 90 

step function 120 
Stirling’s formula 58 
stochastic process 44 
successive trials 16 


Taylor series 60 
temperature 95 
temperature records 61 
temperature waves 137 
thermal conduction 96 


199 


thermal conductivity 96 

thermal convection 77 

thermal diffusivity 96 

thermal insulation 121 

thermal isolation 121 

three-dimensional continuity equation 90 
frat 43 


uniform distribution 18 


variance 24, 52 
of a normal distribution 30 
vector flux density 83 


weight function 63 


MS324 Waves, diffusion and variational principles 


lock O | | | 
reparatory material - 


lock | | 
aves 


lock Il | : 
> Random walks and diffusion 


+ 


lock III 
ariational principles 


vawailpn, 


MS324 Block II 
ISBN 978 0 7492 51680 


